Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Streamflow Model Evaluation

Overview

In broad strokes, the evaluation workflow contains these components:

This can be broken down per variable, as there are different reference datasets for variables, as well as aggregation methods. Examples below will emphasize streamflow as an example, comparing historical stream gage readings against the NWM streamflow predictions for those gages.

Each of the yellow boxes in the block diagram is mapped to a notebook in which that bit of processing takes place.

Source Data

Source datasets include modeled data ( NWM or NHM, for example) and a reference dataset representing the ‘observed’ values covering the same variable and temporal range. For streamflow data, we have actual gage readings. For other variables, we have other standard datasets representing the reference against which the model will be compared.

Source datasets are found in a variety of storage mechanisms. The main mechanisms that we need to be able to accomodate are:

The source datasets may be replecated among two or more of these access methods. Which copy to use may depend on where the processing takes place (i.e. if running a notebook on denali or tallgrass, on-prem data is preferred over S3; if running on a cloud environment (esip/qhub), S3 is preferred.)

Data Preparation Notebook

This pre-processing step is needed in order to rectify the data and organize it in preparation for analysis. Rectifying the data includes measures such as:

At this stage, a given variable should be represented as a pair of 2D array of values (one for simulated, one for observed). One dimension of the array is indexed by some nominal key (‘gage_id’, ‘HUC-12 ID’, etc), while the other dimension is indexed by time step.

Analysis Notebook

The above data organization steps will allow us to extract a time series for a given station from each of the simulated and observed datasets, and run a series of statistical metrics against these values to evaluate Goodness Of Fit (GOF). Benchmarking proceeds according to this general recipe:

  1. A set of predictions and matching observations (i.e. the data, established above);

  2. The domain (e.g. space or time) over which to benchmark

    • This will vary by variable and by dataset. For streamflow, the list of ‘cobalt’ gages (Foks et al., 2022) establishes the spatial domain identifying which gages to consider.

    • Other variables will have other definitions for domain, which restrict analysis to a specific set of locations or times.

  3. A set of statistical metrics with which to benchmark.

    • In this tutorial, we are focusing on streamflow and the metrics relevant to that variable.

    • A different set of metrics may be used for other variables.

    • We will be using the ‘NWM Standard Suite’ and ‘DScore’ metrics to analyize streamflow.

The end result of this analysis is a 2D table of values. One dimension of this array/table is the same nominal data field (i.e. ‘gage_id’), the other dimension being the metrics comparing observed vs simulated for that gage. It is this table of values we send to the visualization step.

Vizualization Notebook

Visualization steps offer different views of the metrics, plotted in various ways to allow for exploration. In addition to these interactive visualizations, a score card is offered as a way of summarizing how well the model compares against the reference dataset.

References
  1. Foks, S. S., Towler, E., Hodson, T. O., Bock, A. R., Dickinson, J. E., Dugger, A. L., Dunne, K. A., Essaid, H. I., Miles, K. J., Over, T. M., Penn, C. A., Russell, A. M., Saxe, S. W., & Simeone, C. E. (2022). Streamflow benchmark locations for hydrologic model evaluation within the conterminous United States (cobalt gages). U.S. Geological Survey. 10.5066/P972P42Z