Prologue. Process-based hydrological models can be roughly divided into two (ill-named and overlapping) categories (e.g., see Hrachowitz and Clark, 2017):
We started our career using bucket-style models, and we have recently moved to the large-domain physically-based land modeling community. Our current research topics in a nutshell can be summarized as: (1) To understand the weakness in large-domain hydrological models and for model improvement and (2) to represent and assess the often-neglected water management component in models of the terrestrial water cycle. Here we provide a brief reflection of our observation through our journey along the continuum of model complexity.
Workflow, the biggest difference. The greatest differences between these two main branches of hydrological modeling is not only the underlying representations of the physics, but also the underlying workflow! “A workflow consists of an orchestrated and repeatable pattern of activity, enabled by the systematic organization of resources into processes that transform materials, provide services, or process information” (from Wikipedia).
In typical applications, bucket-style modellers tend to spend less time on the pre-processing of model inputs than on modelling itself. Input data tends to be catchment-averaged and these models rarely require more data than precipitation, temperature and some estimate of potential evapotranspiration. Because these models tend be used in a spatially aggregated fashion and rely on effective parameters, less work is required to define the structure of the model domain (topography, vegetation, soils, geology) and the models have lower computational costs). The simplicity of these models means that much more effort can be spent on topics such as parameter estimation, sensitivity analysis, and uncertainty analysis.
The situation is quite the opposite in the large domain physically-based land modeling community, where more of the modeling effort is focused on the model workflow. These models are typically spatially distributed over a large geographical domain, and tend to require large amounts of input data: forcing variables for mass and energy balance calculations, estimates of soil type and land use, accurate delineations of catchments and routing networks, etc. Given the computational time associated with these kinds of models, sensitivity and uncertainty analysis tend to receive relatively less attention.
Despite these differences, both modelling approaches share practical challenges such as data collection, quality control, basic pre-processing of forcing, reducing computational costs, etc. Different models hence have similar tasks in the model workflow. We can thus ask ourselves questions such as: how many times have we delineated a catchment for a given outlet point, and what are the best GIS tools to do so? How many times did we sort hydrograph data, fill the missing values, remove outliers? How often did other people download the same data and work through the same tasks that we did? Did we develop a tool that does the job automatically? Is our code for model workflow of sufficient quality to be shared with our colleagues? The answer to these questions may bring us closer to elements of a universal workflow (Yes! Before we can talk about universal models or community models, we need a universal workflow).
Model reproducibility is enabled by a smooth workflow. We are currently part of a wider effort to streamline and share workflows for land models. We can improve the reproducibility of our results by explicitly documenting every step of the model workflow, from data download to model evaluation. It also considerably reduces the start-up time needed by new group members which in turn helps us to focus, and spend more time on science questions. More than its practical benefits, a well-designed workflow increases the transparency of our science and the robustness of our scientific findings. We believe that an excellent idea will die if it lacks a proper workflow.
Designing a workflow is a collaborative endeavor. The lack of collaborative workflow management can result in unbalanced workload between team members. This usually translates into an unproductive workload for Ph.D. students or post-doctoral researchers because they spend time on manual analysis or developing procedures that could be better automated or systemized. Model workflow requires many skills, such as the ability to use high-performance computers, learning Linux commands, learning C or Fortran (Python and R help but may not be sufficient), and collaborative code development (e.g. GitHub). Users/developers should not get trapped in their own workflow. A good workflow is a dynamic and agile one that can adapt to the need of modelers and users while it respects the level of education and intellectual ability of every single person and leaves enough room for a modeler to make reasonable changes and amendments to explore the underlying science questions.
Learning from each other, an avenue to explore. There is a certain lack of knowledge transfer between the two modelling communities we are part of (despite the recent efforts to understand and bridge this gap, e.g., Archfield et al., 2015; Clark et al., 2017). The lack of knowledge transfer is understandable given the different foci of the modeling communities, the target audience, and also the context of the modeling efforts. However, there is a lot to be learned from each other. As an example, there is a significant body of literature on the effect of model structure in conceptual models, or similarly, hillslope hydrology at the hillslope scale that can be incorporated into large scale modeling (e.g., see Gharari et al., 2019 and Gharari et al., 2020, and the many more open questions in this area). It is also good for bucket-style modelers to learn from the large-domain physically-based modeling community, for example by incorporating basic energy balance calculations into bucket-style models. Having worked on both topics, we notice how each modelling approach brings new perspectives into our modeling efforts.
We are still learning, constantly learning. As part of the Computational Hydrology group at the University of Saskatchewan, we are busy setting up land models for the entire North American domain and improving continental-scale routing schemes. We are having fun! Every day, we learn more about python and its packages’ capabilities, Fortran, available local and global datasets that can be used in our modeling efforts. We do a lot of “GitHub-ing” these days. Re-using each other’s code, reviewing each other’s scripts and sharing them with the wider public. There is still work to be done for our efforts to bear fruits, but we are hopeful that we will make important contributions to the science and practice of large-domain hydrological modelling.
Looking back on what we’ve learned so far, we would probably have done many things in our previous work differently. We believe that is how the learning curve should be, and are working hard to set up our workflows to be able to deal with such evolving insight!
Final words. Moving from one branch of modelling to another has not been an easy process, but we think it helped us quickly grow beyond the skills we learned during our PhDs. This blog post is aimed to create discussion and thinking about the infrastructure needed to make switching between different types of models and modelling approaches possible. We have for the moment settled on well-documented workflows to enable this. Feel free to share this post with your interested colleagues and do not hesitate to contact us (the authors of the blog). We would be more than happy to keep the conversation moving. Onward!