Standard Suite (v1) Metrics

Standard Suite (v1) Metrics#

These are custom-defined Python functions to calculate metrics using timeseries data. These are mainly intended to be used with streamflow timeseries data, though may be applicable elsewhere.

See Towler et al 2023 for a published application.

Towler, E., Foks, S.S., Dugger, A.L., Dickinson, J.E., Essaid, H.I., Gochis, D., Viger, R.J., and Zhang, Y. (2023). Benchmarking High-Resolution, Hydrologic Performance of Long-Term Retrospectives in the United States, Hydrology and Earth System Sciences.

These functions have been adapted from the originals in USGS-python/hytest-evaluation-workflows

The Metrics:#

This suite of metrics describes the Standard Suite v1 benchmarking metrics:

Metric	Used on these variables	Reference
Nash-Sutcliffe efficiency (NSE)	all	Nash, J. E., & Sutcliffe, J. V. (1970). River flow forecasting through conceptual models part I—A discussion of principles. Journal of hydrology, 10(3), 282-290. https://www.sciencedirect.com/science/article/pii/0022169470902556?via%3Dihub
Kling-Gupta efficiency (KGE)	all	Gupta, H. V., Kling, H., Yilmaz, K. K., & Martinez, G. F. (2009). Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. Journal of hydrology, 377(1-2), 80-91. https://www.sciencedirect.com/science/article/pii/S0022169409004843
logNSE	all	Oudin, L., Andréassian, V., Mathevet, T., Perrin, C., & Michel, C. (2006). Dynamic averaging of rainfall‐runoff model simulations from complementary model parameterizations. Water Resources Research, 42(7).
percent bias	all	A measure of the mean tendency of simulated values to be greater or less than associated observed values, units of percent
ratio of standard deviation	all	standard deviation of simulated values divided by the standard deviation of observed values
Pearson Correlation	all	K. Pearson (1896, 1900, 1920)
Spearman Correlation	all	Charles Spearman (1904, 1910)
percent bias in midsegment slope of the flow-duration curve (FDC) between Q20-Q70	streamflow	Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9).
percent bias in FDC low-segment volume (Q0-Q30)	streamflow	Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9).
percent bias in FDC high-segment volume (Q98-Q100)	streamflow	Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9).

This notebook will briefly describe each of the above metrics, and show some results using sample data. The specific code to implement each metric is included. This notebook can be sourced into analysis notebooks to get access to these functions natively.

import numpy as np
import pandas as pd
import logging

Metric Definitions#

Mean Square Error (MSE)#

Many subsequent functions rely on MSE.

def MSE(obs, sim) -> float:
    """
    Mean Square Error --   Compute MSE over all paired values obs (x) and sim (x_hat)
        
    Returns
    -------
    float
        Mean square error
    """
    err = obs - sim
    return np.mean(err**2)

Nash-Sutcliffe efficiency (NSE)#

Nash, J. E., & Sutcliffe, J. V. (1970). River flow forecasting through conceptual models part I—A discussion of principles. Journal of hydrology, 10(3), 282-290. https://www.sciencedirect.com/science/article/pii/0022169470902556?via%3Dihub

def NSE(obs, sim) -> float:
    """
    Nash-Sutcliffe efficiency (NSE)

    Returns:
        float: calculated NSE
    """
    return 1 - (MSE(obs, sim) / np.var(obs, ddof=0))
    # See NOTE re:  ddof 

Special note on NSE & variance — A component within the calculation of NSE is variance computed over the observed values. Different python libraries calculated this in different ways, so some of the details matter when calculating. In particular, numpy assumes that ddof (Delta Degrees of Freedom) is zero, while pandas assumes a ddof of one (Bessel’s Correction).

Without explicit instructions, these two common libraries will return different results for the ‘same’ calculation, so it is important not to inter-mix the libraries. If you should decide to build your own functions involving variance, it will matter how you calculate that value:

df['obs'].var()  # using pandas

will yield a different result than

np.var(df['obs']) # using numpy

The key (in either case) is to explicitly define the ddof:

df['obs'].var(ddof=0)
# or
np.var(df['obs'], ddof=0)

The original codespec for this benchmark series used numpy, with its default DDOF of 0 and explicit definition of DDOF to ensure compatibility with similar metrics.

Ratio of Standard Deviations#

def rSD(obs, sim) -> float:
    """
    ratio of standard deviation  -- standard deviation of simulated/modeled
    values divided by the standard deviation of observed values

    Returns:
        float: calculated ratio
    """
    try:
        return np.std(sim) / np.std(obs)
    except ZeroDivisionError:
        logging.warning("std dev of observed is zero; ratio undefined")
        return None

Correlation Coefficients: Pearson and Spearman#

These standard measures are available in reliable and fast libraries from SciPy.

from scipy.stats import pearsonr, spearmanr

def pearson_r(obs, sim) -> float:
    """
    Pearson Correlation -- Pearson's R, calculated using the scipi library method

    Returns
    -------
    float
        Pearson's R
    """
    return pearsonr(obs, sim)[0]

def spearman_r(obs, sim) -> float:
    """
    Spearman Correlation == Spearman's R, calcuated using the scipy method

    Returns
    -------
    float
        Calculated R                                 |
    """
    return spearmanr(obs, sim)[0]

Kling-Gupta efficiency (KGE)#

Gupta, H. V., Kling, H., Yilmaz, K. K., & Martinez, G. F. (2009). Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. Journal of hydrology, 377(1-2), 80-91. https://www.sciencedirect.com/science/article/pii/S0022169409004843

def KGE(obs, sim) -> float:
    """
    Kling-Gupta efficiency (KGE)

    Returns:
        float: Calculated KGE
    """
    r = pearsonr(obs, sim)[0]
    alpha = rSD(obs, sim)
    beta = np.sum(sim) / np.sum(obs)
    return 1 - np.sqrt((r-1)**2 + (alpha-1)**2 + (beta-1)**2)

logNSE#

Oudin, L., Andréassian, V., Mathevet, T., Perrin, C., & Michel, C. (2006). Dynamic averaging of rainfall‐runoff model simulations from complementary model parameterizations. Water Resources Research, 42(7).

This is a NSE metric, run against log-transformed data. Because we can’t have log work on zero values, the data must be sanitized to ensure only positive values are passed to np.log(). Of the various ways to treat zeros, we use a clipping function to ‘promote’ values below a small threshold up to that threshold value. By default, any value below 0.01 is treated as 0.01 for purposes of the log transform.

This data sanitization is handled differently within other libraries, notably hydroeval. That package uses a slightly more complex strategy to ensure that log() gets clean data to work on. The hydroeval developer references Pushpalatha et al. (2012) regarding their strategy. The details of that method are beyond scope here – just know that if you compare results with hydroeval, this metric may yield very slightly different results.

def logXform(a, **kwargs):
    ### we are allowing for the possible future addition of other methods to treat zero values. 'clip' is the default. 
    if 'clip' in kwargs:
        assert kwargs['clip'] > 0
        A = a.clip(kwargs['clip'])
    return np.log(A)

def logNSE(obs, sim) -> float:
    """
    logNSE - computes NSE using the log of data (rather than data)

        float: Calculated NSE of log(data)
    """
    return NSE(logXform(obs, clip=0.01), logXform(sim, clip=0.1))

Percent Bias#

A measure of the mean tendency of simulated values to be greater or less than associated observed values, units of percent

def pbias(obs, sim) -> float:
    """
    percent bias -- a measure of the mean tendency of simulated values to be
    greater than or less than associated observed values.

    Returns:
        float: calculated percent bias / units = percent (i.e. 90 rather than 0.90)
    """
    return 100 * np.sum(sim - obs) / np.sum(obs)

Special Note on pbias – as relates to hydroeval and other libraries.

The result we compute here mimics the behavior of the hydroGOF R package, and is the result of the code provided in the model notebook mentioned above.
This differs from the hydroeval Python package in an important way.
hydroGOF (and this benchmark) returns:
\(100 × \frac{\displaystyle\sum_{i=1}^{n}(\hat{x}_{i} - x_{i})}{\displaystyle\sum_{i=1}^{n}x_{i}}\)
where \(x\) is ‘observed’ and \(\hat{x}\) is ‘modeled’
hydroeval on the other hand, returns:
\(100 × \frac{\displaystyle\sum_{i=1}^{n}(x_{i} - \hat{x}_{i})}{\displaystyle\sum_{i=1}^{n}x_{i}}\)
Note tht the numerator has switched the ordering of \(x\) and \(\hat{x}\).

The end result is that these two libraries return values of different sign. hydroGOF returns a positive value if the ‘modeled’ tends to be higher than ‘observed’, while hydroeval will return a negative number in this case. The absolute values of these calulations are the same.

The developer for hydroeval points to this document as the source of the math used in that package.

This code library uses the same ordering as hydroGOF, which is describe in EQN A1 of Yilmaz et al. (2008)

FDC - Flow Duration Curves#

Metric	Reference
percent bias in midsegment slope of the flow-duration curve (FDC) between Q20-Q70	Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9).
percent bias in FDC low-segment volume (Q0-Q30)	Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9).
percent bias in FDC high-segment volume (Q98-Q100)	Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9).

pBiasFMS#

This is the percent bias of the slope of the FDC in the mid-segment part of the curve. See equation A2 of Yilmaz

\(\%BiasFMS = 100 × \cfrac{ [log(QS_{m1}) - log(QS_{m2})] - [log(QO_{m1}) - log(QO_{m2})] }{ [log(QO_{m1}) - log(QO_{m2})] }\)

def pBiasFMS(obs, sim) -> float:
    """
    calculates percent bias of the slope the mid-segment of FDC.

    Returns:
        float: percent bias for values in exceedence probability range 0.2-0.7
    """
    # Exceedence = 1 - percentile  ;  percentile = 1 - exceedence
    # mid-segment slope is defined as those observations with flow exceedence probabilities between 20% and 70%.
    # This leads to percentiles/quantiles of 30% and 80% to establish the cut-offs
    QO_m1, QO_m2 = np.quantile(obs, [0.30, 0.80])
    QS_m1, QS_m2 = np.quantile(sim, [0.30, 0.80])
    m = np.log(QS_m1) - np.log(QS_m2)
    o = np.log(QO_m1) - np.log(QO_m2)
    return 100 * (m - o ) / o

pBiasFLV#

Percent bias in low-flow segment volume. Note that in low flow segment, a log transform is used to increase sensitivity to very low flows. See equation A4 from Yilmaz.

\(\%BiasFLV = -100 × \cfrac{ \displaystyle\sum_{l=1}^L[log(QS_l) - log(QS_L)] - \displaystyle\sum_{l=1}^L[log(QO_l) - log(QO_l)] }{ \displaystyle\sum_{l=1}^L[log(QO_l) - log(QO_L)] }\)

def pBiasFLV(obs, sim) -> float:
    """
    calculates percent bias over the low-flow segment volume.
    Note that for low-flow observations a log transform is done before the
    pbias calculation.

    Returns:
        float: percent bias for values in exceedence probability range 0.7-1.0
    """
    # Exceedence = 1 - percentile  ;  percentile = 1 - exceedence
    # Low-Volume is defined as those observations with flow exceedence probabilities between 70% and 100%.
    # This leads to percentiles/quantiles of 0% and 30% to establish the cut-offs
    _, QO_L = np.quantile(obs, [0.0, 0.30])
    _, QS_L = np.quantile(sim, [0.0, 0.30])
    idx = (obs <= QO_L) # defines boolean selector index
    QS_l = sim[idx]
    QO_l = obs[idx]
    m = np.sum(np.log(QS_l) - np.log(QS_L))
    o = np.sum(np.log(QO_l) - np.log(QO_L))
    return -100 * (( m - o ) / o)

pBiasFHV#

Percent bias in high-flow segment volume. See equation A3 of Yilmaz

\(\%BiasFHV = 100 × \cfrac{ \displaystyle\sum_{h=1}^H(QS_h - QO_h) }{ \displaystyle\sum_{h=1}^H QO_h }\)

def pBiasFHV(obs, sim) -> float:
    """
    calculates percent bias over the high-flow segment volume.

    Returns:
        float:
    """
    # Exceedence = 1 - percentile  ; percentile = 1 - exceedence
    # 'High-Volume' is defined as those observations with flow exceedence probabilities between 0 and 2%.
    # This leads to percentiles/quantiles of 98% and 100% to establish the cut-offs
    #
    minval, maxval = np.quantile(obs, [0.98, 1.0])
    idx = (obs >= minval) & (obs <= maxval)
    QS_h = sim[idx]
    QO_h = obs[idx]
    # standard pbias over these observations
    return 100 * ( (QS_h - QO_h).sum() / QO_h.sum() )

Flow Duration Curve Plotting#

We also allow for plots of the FDC, highlighting the segment-specific metric being calculated. See the usage document to see how this might be called.

def FDCplot(obs, sim, ax, segment='mid', fill=False):
    """
    Given an axes within a matplotlib.plot figure, populate it with
    artists to show the FDC curves for this dataset (modeled and observed).
    Optionally fills the space between curves.
    Parameters
    ----------
    obs : observed values
    sim : simulated/modeled values
    ax : Axes
        The specific axes within a fig to draw upon
    segment : str, optional
        Which segment of the FDC do you want to highlight? ['lo', 'mid', 'hi'], by default 'mid'
    fill : bool, optional
        : Do you want to 'fill_between' the observed and modeled curves?. by default False
    Returns
    -------
    Axes
        The same axes we were passed in.  Likely ignored.
    """
    obs_sorted = np.sort(obs)[::-1]
    mod_sorted = np.sort(sim)[::-1]
    exceedence = np.arange(1.,len(obs_sorted)+1) / len(obs_sorted)

    ax.set_xlabel("Exceedence [%]", fontsize=6)
    ax.set_ylabel("Flow Rate", fontsize=6)
    # NOTE: no units are specified in this label -- because we don't know enough about
    # the source data.  If the user wants units in the y axis labels, they can call
    # ax.set_ylabel("Flow Rate $m^3 / s$") (or similar) afterwards to manipulate it.
    ax.set_yscale('log')

    if segment not in ['mid', 'lo', 'hi']:
        logging.debug("Invalid segment identifier '%s'. Options are ['mid', 'lo', 'hi']", segment)
        segment = 'mid'

    if segment == 'mid':
        pb=pBiasFMS(obs, sim)
        ax.axvline(x = 20, color = 'b', lw=0.25)
        ax.axvline(x = 70, color = 'b', lw=0.25)
        ax.axvspan(20, 70, facecolor='lightgrey', alpha=0.25)
        ax.text( 0.45 , 0.05, "Mid-Segment",
            verticalalignment='bottom', horizontalalignment='center',
            transform=ax.transAxes,
            color='b', fontsize=4)
        ax.text( 0.45 , 0.01, f"pBiasFMS= {pb:.4f}",
            verticalalignment='bottom', horizontalalignment='center',
            transform=ax.transAxes,
            color='b', fontsize=3)

    if segment == 'lo':
        pb=pBiasFLV(obs, sim)
        ax.axvline(x = 70, color = 'b', lw=0.25)
        ax.axvline(x = 100, color = 'b', lw=0.25)
        ax.axvspan(70, 100, facecolor='lightgrey', alpha=0.25)
        ax.text( 0.99 , 0.05, "Low-Flow",
            verticalalignment='bottom', horizontalalignment='right',
            transform=ax.transAxes,
            color='b', fontsize=4)
        ax.text( 0.99 , 0.01, f"pBiasFLV= {pb:.4f}",
            verticalalignment='bottom', horizontalalignment='right',
            transform=ax.transAxes,
            color='b', fontsize=3)

    if segment == 'hi':
        pb=pBiasFHV(obs, sim)
        ax.axvline(x = 0, color = 'b', lw=0.25)
        ax.axvline(x = 2, color = 'b', lw=0.25)
        ax.axvspan(0, 2, facecolor='lightgrey', alpha=0.25)
        ax.text( 0.02 , 0.05, "High-Flow",
            verticalalignment='bottom', horizontalalignment='left',
            transform=ax.transAxes,
            color='b', fontsize=4)
        ax.text( 0.02 , 0.01, f"pBiasFHV= {pb:.4f}",
            verticalalignment='bottom', horizontalalignment='left',
            transform=ax.transAxes,
            color='b', fontsize=3)

    ax.plot(exceedence*100, mod_sorted, color='r', lw=0.33, label='Modeled')
    ax.plot(exceedence*100, obs_sorted, color='k', lw=0.33, label='Observed')
    if fill:
        ax.fill_between(exceedence*100, obs_sorted, mod_sorted, lw=0.33, color='r', hatch='//////', alpha=0.0625)
    ax.legend(loc='upper right', fontsize=6, handlelength=3)
    ax.grid(linewidth=0.05, dashes=[8,12], color='k', which="both", axis='y')
    ax.grid(linewidth=0.1, dashes=[16, 8], which='major', axis='x')
    ax.set_xlim((0, 100))
    ax.tick_params(which='both', labelsize=4, width=0.25, length=2, pad=1)
    _ = [ax.spines[axis].set_linewidth(0.2) for axis in ['top','bottom','left','right']]

    return ax