View on GitHub

Data Products and Interpolation

Oak Ridges Moraine Groundwater Program

Long Term Water Budget Model

A regionally-distributed runoff-recharge model

A regionally-distributed runoff/recharge model has been developed to simulate regional-scale (»10k km²) hydrologic processes at a fine (50m grid) resolution. The model code is written in an attempt to simulate large-scale/high resolution distributed hydrological phenomena while remaining amenable to multi-threaded computer architectures. No process of the model is in any way novel, rather a suite of existing model structures have been chosen and coded to minimize model run times, while maintaining an ease of implementation, practical applicability and scalability.

Executive Summary


Introduction

The following is a description of the regionally-distributed runoff/recharge model, the water budget tool located on our website, hereinafter referred to as the “model”.

The model’s primary intention is to project land surface moisture distribution for the purposes of estimating regional groundwater recharge. It utilizes hydrological model procedures that are amenable to local data availability. While the model can simulate stream flow discharge and overland runoff at any point in space, the model is not intended for rainfall runoff modelling; outputs from this model will be used to constrain regional groundwater models within the 30,000km² Oak Ridges Moraine Groundwater Program jurisdiction of southern Ontario. That said, mean daily streamflow remains the primary calibration target.

This model is currently in beta mode, and its use for practical application should proceed with caution. Users should be aware that model results posted online will always be subject to change. Ultimately, the intent of this model is to produced ranges of long-term (monthly average) water budget metrics (precipitation, runoff, evaporation, recharge, moisture state, etc.) as a hydrological reference for the partners of the ORMGP. This reference is maintained in real-time, updated by the ORMGP server center.

The model is physically based in that mass is conserved and processes are not constrained to any particular timestep. The model conceptualization has maintained parameters that speaks to the common physical hydrology lexicon, with parameters such as percent impervious, conductivity of surficial soils, etc. The model is a hydrological model integrated with a TOPMODEL groundwater system. The source code is open and free to use. Below is the description of the model procedures that have been codified.

Codification

Model performance has been maximized in three ways:

  1. Code selection
    The model is written in Go, an open source, free, cross-platform, self-dependent language. The Go language is inherently concurrent, meaning it has potential to create models that optimize computer resources.

  2. Queuing and Topology
    Conceptually, the model relies heavily on inferred causal relationships, such as runoff from a particular location moves in a downhill direction, where it may infiltrate and cease being runoff at downslope locations. From these relationships, the causal ordering of runoff processes is realized. This order is the model’s topology and gives rise to performance gains like recursive algorithms and queuing/load balancing.

  3. Procedural selection (hydrological process simulation)
    Having a concurrent language alone isn’t enough, rather it is hydrological processes that have been selected and coded into numerical procedures. Probably the biggest limitation to this model, is the strict avoidance of any numerical procedure that can’t be solved analytically.

The model represents a computational part of our data assimilation system (DAS), meaning it’s not intended to be a predictive tool, rather a means to gain information (runoff, recharge etc.) from readily available data. Ultimately the goal is to have at near-realtime, distributed hydro-climatological data, at sub-daily time-steps that includes estimates of:

These variables/states are calculated every 6 hours, to 12.1M model cells (on a 50 $\times$ 50 m² grid). Variables can be outputted at daily, monthly, seasonal, annual, or long-term average time-scales; at the original 50m grid or aggregated to 10km² sub-watersheds.

Theory

Shallow groundwater

\[\zeta=\ln\frac{a}{T_o\tan\beta}\]

Soil moisture accounting

\[\Delta S=P-E-R-G\]

Overland flow routing

\[F_\text{casc}=1-\exp\left(\frac{\beta}{-\alpha}\right)\]

Input Data

In an attempt to make most of computational efficiency, many processes that are typically computed as part of a hydrological model have been pre-processed as input to the ORMGP water balance model. Processes such as snowmelt and potential evapotranspiration can modelled independently of the rainfall-runoff-recharge process and thus much computational gains can be made if these processes are pre-determined.

A top-down approach

The model considers the greater role the atmosphere has on its ORMGP region. The Planetary Boundary Layer (Oke, 1987) is conceptualized as the barrier from which mass must transfer when surface evaporation is captured by the atmosphere and when liquid water originating from the atmosphere is released onto the land surface.

The model input (“climate forcing”) data are provided on a 6-hourly timestep. These data have been distributed to some 3,000 10km² sub-watersheds. They reflect the sources and sinks, respectively, of liquid (read: mobile) water on the land surface.

The aim of the model design is to simultaneously reduce the amount of computational processes and leverage near-realtime data assimilation products. So, it is first recognized that from a hydrological model design perspective, that the primary driver of watershed moisture distribution is the “Atmospheric Yield”, that is water sourced from the atmosphere in its liquid/mobile form.

\[\text{Atmospheric Yield} = \text{Rainfall} + \text{Snowmelt}\]

In a similar sense, the “atmosphere” (specifically the Planetary Boundary Layer of Oke, 1987) also has a drying power, a sink termed “Atmospheric Demand”.

It is matter of perspective that dictates the terminology here. The model was designed from a top-down viewpoint. Terms like “potential evaporation”, which speaks to the evaporation occurring on a surface with unlimited water supply is instead termed “atmospheric demand”, that is the capacity for the atmosphere to remove moisture from a rough land surface.

Only snowmelt, rainfall and evaporation are not readily available in a distributed form and need to be determined. The model is integrated with the ORMGP data management platform. Below is an interactive map of the climate forcing distribution used in the model. Total model coverage ~30,000km².

Note: sub-watersheds shown above each consists of roughly 4000 model grid cells


Finally, the model was designed to remain amenable to data availability and new technologies. For instance, SNODAS can avoid the need to model snowmelt explicitly (saving on run-times) and leverage these online resources. CaPA-RDPA eliminates the undesirable need to spatially interpolate station-based precipitation.

Data sources, transformations and pre-processing

see glossary

Model Variables

One goal set for the model design was to leverage contemporary gridded data sets available from a variety of open and public sources. Products known as “data re-analysis products” or “data-assimilation products” attempt to merge meteorological information from a variety of sources, whether they be ground (station) measurements, remote sensing observations (e.g., radar, satellite, etc.), and archived near-cast weather model outputs. When combined, the gridded data resemble the spatial distribution of meteorological forcings unlike what can be accomplished through standard interpolation practices using point measurements (e.g., thiessen polygons, inverse distance weighting, etc.).

An advantage to the data-assimilation products is that it removes the modeller from needing to model certain processes explicitly. Here, for example, the model does not account for a snowpack, rather inputs to the model include snowmelt derived from SNODAS.

The extent of the model combined with the resolution of the processes simulated lends itself best viewed from a top-down perspective (REF). This allows for model simplification by which many of the layered water stores (i.e., interception, depression, soil, etc.) may be handled procedurally as one unit. Viewing the model domain in it’s 30,000 km² extents,one can imagine how difficult it would be to discern any vertical detail.

Physical Constraints

Digital Elevation Model

The Greater Toronto Area 2002 DEM (OMNRF, 2015) was re-sampled to the model’s 50x50m grid cell resolution. Surface depressions were removed using Wang and Liu (2006) and flat regions were corrected using Martz (1997).

Drainage directions and flow-paths of the now “hydrologically correct” DEM were were assigned based on the direction of steepest decent (D8). Cell gradients ($b$) and slope aspects were calculated based on a 9-cell planar interpolation routine. The unit contributing area $a=A/w$ topographic wetness index $ln\frac{a}{\tan\beta}$ (Beven and Kirkby, 1979–CHECK) were computed for every cell.

Sub-basins

The 30,000 km² model area has been sub divided into 2,813 approximately 10 km² sub-basins. Within these sub-basins:

  1. Meteorological forcings from external sources are aggregated applied uniformly within the sub-basin (via a pre-processing routine); and
  2. Local shallow water response is assumed to act uniformly (the shallow subsurface catchment area).


Land use and Surficial geology

An overlay analysis is the process of overlaying 2 or more spatial layers and capturing statistics associated with their relative coverage. In this case, the sub-watershed layer is overlain by Provincial land-use and surficial geology layers to obtain information like percent impervious, relative permeability, etc.

Provincial layers discussed in more detail below have in all cases been re-sampled to the 50x50m² grid associated with the hydrologically corrected DEM. It is from these rasters where the aggregation of watershed characteristics is computed.

Downscaling of the above layers is based on the dominant land use/surficial geology class (by area) assigned within every 50x50m² grid cell.

Impervious and Canopy coverage

Using a look-up system, the set of raster cells contained within every 50x50m² grid cell are assigned a value of imperviousness, water body, wetland and canopy coverage (according to their SOLRIS index) and accumulated to grid cell sum.

Percent impervious and canopy coverage as per SOLRIS v3.0 (MNRF, 2019) land use classification.

SOLRIS index Name Imperviousness (%) Canopy cover (%)
23 Treed Sand Dune   50
43 Treed Cliff and Talus   50
52 Shrub Alvar   25
53 Treed Alvar   50
65 Sparse Treed   40
81 Open Tallgrass Prairie   10
82 Tallgrass Savannah   35
83 Tallgrass Woodland   85
90 Forest   100
91 Coniferous Forest   100
92 Mixed Forest   100
93 Deciduous Forest   100
131 Treed Swamp   50
135 Thicket Swamp   50
140 Fen   25
150 Bog   25
160 Marsh   25
191 Plantations – Tree Cultivated   85
192 Hedge Rows   85
201 Transportation 85  
202 Built-Up Area – Pervious 10 10
203 Built-Up Area – Impervious 90  
205 Extraction – Peat/Topsoil   10
  All else 0 0


Permeability

The OGS classes have been grouped according to the attribute “permeability” using a similar look-up table cross-referencing scheme. OGS (2010) adds: “Permeability classification is a very generalized one, based purely on characteristics of material types.”

After assigning an assumed “effective” hydraulic conductivity to every permeability group, sub-watershed “permeability” is then calculated as the geometric mean of 50x50m² grid cells contained within a sub-watershed. Effective hydraulic conductivity value assumed for every permeability group are:

Permeability classifications (after OGS, 2010) and assumed effective hydraulic conductivities.

OGS classification K (m/s)
Low 1e-09
Low-medium 1e-08
Medium 1e-07
Medium-high 1e-06
high 1e-05
unknown/variable 1e-08
fluvial 1e-05
organics 1e-06


The resulting effective hydraulic conductivity is then mapped back to the nearest Low–High OGS (2010) classification.

Source code

Processing discussed above that are operational have been documented in a jupyter notebook. Source data can be found here and additional outputs can be found here.

Model Structure

Computational Elements

Model grids in the model represent a homogenized area on the land surface, similar to the concept of a Hydrologic Response Unit (HRU). The term HRU will be avoided here as the concept itself is not well defined and simply “model cell” or “cell” will be used herein to avoid ambiguity. All processes within the cell are assumed to represent the “average” condition within the grid cell. The spatial proximity of each cell is maintained, meaning that runoff occurring from an up-gradient cell will “runon” to an adjacent down-gradient cell to preserve the spatial distribution of land surface moisture.

Time-stepping

The model runs on a 6-hourly basis in order to capture hydrological dependence on the diurnal fluctuation of energy flux at the ground surface.

The time step of the model has been set to 6 hour steps on 00:00 UTC, yielding the local time steps of 01:00, 07:00, 13:00, and 19:00 EST. This step is chosen as it matches the received precipitation dataset described above.

Model Domain

The overarching model code employs an object called the “Model Domain”. Here, all necessary model data are digitized and self-contained into a set of 5 Model Domain Components, including:

  1. Structure holds all the geometrical and topological constraints to grid cells, including the physical dimensions of cells and knowledge of where runoff is to be routed.
  2. Mapper contains a cell-to-type cross-reference map to distribute model parameters, such as percolation rates, soil zone depth, depression storage, etc. The Mapper projects parameter selection onto the grid space domain based on their local (an often unique) land type.
  3. Forcings: Input (variable) data, generally climate forcings: $Y_a$ and $E_a$.
  4. Subwatershed provides the model with optimal order of processing, ensuring a down stream process.
  5. Parameters: Parameter collections for a set of land-surface, surficial geology and groundwater reservoir types.

Parameterization

The model’s structure is defined rather simply by at least 5 raster data sets. The model’s input data pre-processor will generate additional information based on these data:

  1. digital elevation model
  2. land use index (with parameter lookup table)
  3. surficial geology index (with parameter lookup table)
  4. groundwater zones

While a distributed model, the procedures applied at the cell scale are quite parsimonious. There is no separate treatment of interception, depression storage, nor soil water retention, rather it is assumed that these processes respond to environmental factors (e.g., evaporation) concurrently and thus treated as one.

From the top down perspective, viewing some 12.1 million 50x50m cells covering 30,000 km², it seems rather overcomplicated (possibly frivolous) to account water any more than to total mass present at any particular location.

Runoff

Land area that has the capacity to retain water (through interception, soil retention, depression/rill storage, etc.) must be satisfied before runoff is generated.

Runoff is conceptualized as being generated through the saturation excess (Dunne, 1975 CHECK) mechanism. The saturation excess mechanism is dependent on topography and its interaction with the water table. The model is distributed with an integrated (albeit conceptual) groundwater system.

Surface water-groundwater integration is viewed from the hydrologists’ perspective: areas where is the groundwater system limiting infiltration (shallow groundwater table) and even contributing to overland runoff (springs/seepage areas). As a model output, this can be quantified as a net groundwater exchange field (groundwater recharge less discharge)

The basis of the model is that topography is paramount to the lateral movement of water yielding runoff. The model is deemed regional, in that it covers a large areal extent, yet is kept to a fine resolution to ensure that observed geomorphic flow patterns are represented in the model.

pre-defined data-driven parameters

The following are determined using a topographical analysis:

globally applied parameters

Cells with a contributing area greater than 1 km² are deemed “stream cells” in which additional sources include groundwater discharge to streams.

Regional groundwater parameters

Groundwater processes in the model are conceptualized at the sub-basin scale and so much of the groundwater parameterization is implemented here.

Land use based parameters

Each cell is classified according to (i) surficial geology and (ii) land use mapping where each class is parameterized independently. “Look-up tables” are used to distribute model parameters according to their classification.

The following are used to compute the overall retention/storage capacity:

Surficial geology based

Checks

As part of the model pre-processor, the structural data and parameterization specified above is outputted as a set of raster for user inspection and troubleshooting. These raster are written as binary rasters (*.bil) to a check/ directory, prefixed by the model name and the model domain component. Parameters include:

Testing

As a simple expression of the model’s ability to squeeze everything from the computer, this is what happens when 12.1M model cells running at a 6-hourly timestep:

Glossary

References

Oke, T.R., 1987. Boundary Layer Climates, 2nd ed. London: Methuen, Inc.

Ontario Geological Survey 2010. Surficial geology of southern Ontario; Ontario Geological Survey, Miscellaneous Release— Data 128 – Revised.

Ontario Ministry of Natural Resources and Forestry, 2019. Southern Ontario Land Resource Information System (SOLRIS) Version 3.0: Data Specifications. Science and Research Branch, April 2019.