Standard Suite (v1) Metrics#
These are custom-defined Python functions to calculate metrics against time-series data.
These statistics adapted from the originals in USGS-python/hytest-evaluation-workflows
The Metrics:#
This suite of metrics describes the NWM benchmark:
Metric |
Used on these variables |
Reference |
---|---|---|
Nash-Sutcliffe efficiency (NSE) |
all |
Nash, J. E., & Sutcliffe, J. V. (1970). River flow forecasting through conceptual models part I—A discussion of principles. Journal of hydrology, 10(3), 282-290. https://www.sciencedirect.com/science/article/pii/0022169470902556?via%3Dihub |
Kling-Gupta efficiency (KGE) |
all |
Gupta, H. V., Kling, H., Yilmaz, K. K., & Martinez, G. F. (2009). Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. Journal of hydrology, 377(1-2), 80-91. https://www.sciencedirect.com/science/article/pii/S0022169409004843 |
logNSE |
all |
Oudin, L., Andréassian, V., Mathevet, T., Perrin, C., & Michel, C. (2006). Dynamic averaging of rainfall‐runoff model simulations from complementary model parameterizations. Water Resources Research, 42(7). |
percent bias |
all |
A measure of the mean tendency of simulated values to be greater or less than associated observed values, units of percent |
ratio of standard deviation |
all |
standard deviation of simulated values divided by the standard deviation of observed values |
Pearson Correlation |
all |
K. Pearson (1896, 1900, 1920) |
Spearman Correlation |
all |
Charles Spearman (1904, 1910) |
percent bias in midsegment slope of the flow-duration curve (FDC) between Q20-Q70 |
streamflow |
Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9). |
percent bias in FDC low-segment volume (Q0-Q30) |
streamflow |
Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9). |
percent bias in FDC high-segment volume (Q98-Q100) |
streamflow |
Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9). |
This notebook will briefly describe each of the above metrics, and show some results using sample data. The specific code to implement each metric is included. This notebook can be sourced into analysis notebooks to get access to these functions natively.
import numpy as np
import pandas as pd
import logging
Metric Definitions#
Nash-Sutcliffe efficiency (NSE)#
Nash, J. E., & Sutcliffe, J. V. (1970). River flow forecasting through conceptual models part I—A discussion of principles. Journal of hydrology, 10(3), 282-290. https://www.sciencedirect.com/science/article/pii/0022169470902556?via%3Dihub
## Used by NSE and others...
def MSE(obs, sim) -> float:
"""
Mean Square Error -- Compute MSE over all paired values obs (x) and sim (x_hat)
Returns
-------
float
Mean square error
"""
err = obs - sim
return np.mean(err**2)
def NSE(obs, sim) -> float:
"""
Nash-Sutcliffe efficiency (NSE)
Returns:
float: calculated NSE
"""
return 1 - (MSE(obs, sim) / np.var(obs, ddof=0))
# See NOTE re: ddof
Special note on NSE & variance — A component within the calculation of NSE is variance computed over the
observed values. Different python libraries calculated this in different ways, so some of the details matter
when calculating. In particular, numpy
assumes that ddof
(Delta Degrees of Freedom)
is zero, while pandas
assumes a ddof
of
one (Bessel’s Correction).
Without explicit instructions, these two common libraries will return different results for the ‘same’ calculation, so it is important not to inter-mix the libraries. If you should decide to build your own functions involving variance, it will matter how you calculate that value:
df['obs'].var() # using pandas
will yield a different result than
np.var(df['obs']) # using numpy
The key (in either case) is to explicitly define the ddof
:
df['obs'].var(ddof=0)
# or
np.var(df['obs'], ddof=0)
The original codespec for this benchmark series used numpy, with its default DDOF of 0. We conform to that as canon, with explicit definition of DDOF to ensure compatibility with similar metrics.
Ratio of Standard Deviations#
def rSD(obs, sim) -> float:
"""
ratio of standard deviation -- standard deviation of simulated/modeled
values divided by the standard deviation of observed values
Returns:
float: calculated ratio
"""
try:
return np.std(sim) / np.std(obs)
except ZeroDivisionError:
logging.warning("std dev of observed is zero; ratio undefined")
return None
Correlation Coefficients: Pearson and Spearman#
These standard measures are available in reliable and fast libraries from SciPy.
from scipy.stats import pearsonr, spearmanr
def pearson_r(obs, sim) -> float:
"""
Pearson Correlation -- Pearson's R, calculated using the scipi library method
Returns
-------
float
Pearson's R
"""
return pearsonr(obs, sim)[0]
def spearman_r(obs, sim) -> float:
"""
Spearman Correlation == Spearman's R, calcuated using the scipy method
Returns
-------
float
Calculated R |
"""
return spearmanr(obs, sim)[0]
Kling-Gupta efficiency (KGE)#
Gupta, H. V., Kling, H., Yilmaz, K. K., & Martinez, G. F. (2009). Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. Journal of hydrology, 377(1-2), 80-91. https://www.sciencedirect.com/science/article/pii/S0022169409004843
def KGE(obs, sim) -> float:
"""
Kling-Gupta efficiency (KGE)
Returns:
float: Calculated KGE
"""
r = pearsonr(obs, sim)[0]
alpha = rSD(obs, sim)
beta = np.sum(sim) / np.sum(obs)
return 1 - np.sqrt((r-1)**2 + (alpha-1)**2 + (beta-1)**2)
logNSE#
Oudin, L., Andréassian, V., Mathevet, T., Perrin, C., & Michel, C. (2006). Dynamic averaging of rainfall‐runoff model simulations from complementary model parameterizations. Water Resources Research, 42(7).
This is a NSE metric, run against log-transformed data. Because we can’t have log work on zero values, the data must be sanitized to ensure only positive values are passed to np.log()
. Of the various ways to treat zeros, we use a clipping function to ‘promote’ values below a small threshold up to that threshold value. By default, any value below 0.01 is treated as 0.01 for purposes of the log transform.
This data sanitization is handled differently within other libraries, notably hydroeval
. That package uses a slightly more complex strategy to ensure that log()
gets clean data to work on. The hydroeval
developer
references Pushpalatha et al. (2012) regarding their strategy. The details of that method are beyond scope here – just know that if you compare results with hydroeval
, this metric may yield very slightly different results.
def logXform(a, **kwargs):
### we are allowing for the possible future addition of other methods to treat zero values. 'clip' is the default.
if 'clip' in kwargs:
assert kwargs['clip'] > 0
A = a.clip(kwargs['clip'])
return np.log(A)
def logNSE(obs, sim) -> float:
"""
logNSE - computes NSE using the log of data (rather than data)
float: Calculated NSE of log(data)
"""
return NSE(logXform(obs, clip=0.01), logXform(sim, clip=0.1))
Percent Bias#
A measure of the mean tendency of simulated values to be greater or less than associated observed values, units of percent
def pbias(obs, sim) -> float:
"""
percent bias -- a measure of the mean tendency of simulated values to be
greater than or less than associated observed values.
Returns:
float: calculated percent bias / units = percent (i.e. 90 rather than 0.90)
"""
return 100 * np.sum(sim - obs) / np.sum(obs)
Special Note on pbias – as relates to hydroeval
and other libraries.
The result we compute here mimics the behavior of the
hydroGOF
R package, and is the result of the code provided in the model notebook mentioned above.This differs from the
hydroeval
Python package in an important way.hydroGOF
(and this benchmark) returns:
\(100 × \frac{\displaystyle\sum_{i=1}^{n}(\hat{x}_{i} - x_{i})}{\displaystyle\sum_{i=1}^{n}x_{i}}\)
where \(x\) is ‘observed’ and \(\hat{x}\) is ‘modeled’hydroeval
on the other hand, returns:
\(100 × \frac{\displaystyle\sum_{i=1}^{n}(x_{i} - \hat{x}_{i})}{\displaystyle\sum_{i=1}^{n}x_{i}}\)
Note tht the numerator has switched the ordering of \(x\) and \(\hat{x}\).
The end result is that these two libraries return values of different sign. hydroGOF
returns a positive value if the ‘modeled’ tends to be higher than ‘observed’, while hydroeval
will return a negative number in this case. The absolute values of these calulations are the same.
The developer for hydroeval
points to this document as the source of the math used in that package.
This code library uses the same ordering as hydroGOF
, which is describe in EQN A1 of Yilmaz et al. (2008)
FDC - Flow Duration Curves#
Metric |
Reference |
---|---|
percent bias in midsegment slope of the flow-duration curve (FDC) between Q20-Q70 |
Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9). |
percent bias in FDC low-segment volume (Q0-Q30) |
Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9). |
percent bias in FDC high-segment volume (Q98-Q100) |
Yilmaz, K. K., Gupta, H. V., & Wagener, T. (2008). A process‐based diagnostic approach to model evaluation: Application to the NWS distributed hydrologic model. Water Resources Research, 44(9). |
pBiasFMS#
This is the percent bias of the slope of the FDC in the mid-segment part of the curve. See equation A2 of Yilmaz
\(\%BiasFMS = 100 × \cfrac{ [log(QS_{m1}) - log(QS_{m2})] - [log(QO_{m1}) - log(QO_{m2})] }{ [log(QO_{m1}) - log(QO_{m2})] }\)
def pBiasFMS(obs, sim) -> float:
"""
calculates percent bias of the slope the mid-segment of FDC.
Returns:
float: percent bias for values in exceedence probability range 0.2-0.7
"""
# Exceedence = 1 - percentile ; percentile = 1 - exceedence
# mid-segment slope is defined as those observations with flow exceedence probabilities between 20% and 70%.
# This leads to percentiles/quantiles of 30% and 80% to establish the cut-offs
QO_m1, QO_m2 = np.quantile(obs, [0.30, 0.80])
QS_m1, QS_m2 = np.quantile(sim, [0.30, 0.80])
m = np.log(QS_m1) - np.log(QS_m2)
o = np.log(QO_m1) - np.log(QO_m2)
return 100 * (m - o ) / o
pBiasFLV#
Percent bias in low-flow segment volume. Note that in low flow segment, a log transform is used to increase sensitivity to very low flows. See equation A4 from Yilmaz.
\(\%BiasFLV = -100 × \cfrac{ \displaystyle\sum_{l=1}^L[log(QS_l) - log(QS_L)] - \displaystyle\sum_{l=1}^L[log(QO_l) - log(QO_l)] }{ \displaystyle\sum_{l=1}^L[log(QO_l) - log(QO_L)] }\)
def pBiasFLV(obs, sim) -> float:
"""
calculates percent bias over the low-flow segment volume.
Note that for low-flow observations a log transform is done before the
pbias calculation.
Returns:
float: percent bias for values in exceedence probability range 0.7-1.0
"""
# Exceedence = 1 - percentile ; percentile = 1 - exceedence
# Low-Volume is defined as those observations with flow exceedence probabilities between 70% and 100%.
# This leads to percentiles/quantiles of 0% and 30% to establish the cut-offs
_, QO_L = np.quantile(obs, [0.0, 0.30])
_, QS_L = np.quantile(sim, [0.0, 0.30])
idx = (obs <= QO_L) # defines boolean selector index
QS_l = sim[idx]
QO_l = obs[idx]
m = np.sum(np.log(QS_l) - np.log(QS_L))
o = np.sum(np.log(QO_l) - np.log(QO_L))
return -100 * (( m - o ) / o)
pBiasFHV#
Percent bias in high-flow segment volume. See equation A3 of Yilmaz
\(\%BiasFHV = 100 × \cfrac{ \displaystyle\sum_{h=1}^H(QS_h - QO_h) }{ \displaystyle\sum_{h=1}^H QO_h }\)
def pBiasFHV(obs, sim) -> float:
"""
calculates percent bias over the high-flow segment volume.
Returns:
float:
"""
# Exceedence = 1 - percentile ; percentile = 1 - exceedence
# 'High-Volume' is defined as those observations with flow exceedence probabilities between 0 and 2%.
# This leads to percentiles/quantiles of 98% and 100% to establish the cut-offs
#
minval, maxval = np.quantile(obs, [0.98, 1.0])
idx = (obs >= minval) & (obs <= maxval)
QS_h = sim[idx]
QO_h = obs[idx]
# standard pbias over these observations
return 100 * ( (QS_h - QO_h).sum() / QO_h.sum() )
Flow Duration Curve Plotting#
We also allow for plots of the FDC, highlighting the segment-specific metric being calculated. See the usage document to see how this might be called.
def FDCplot(obs, sim, ax, segment='mid', fill=False):
"""
Given an axes within a matplotlib.plot figure, populate it with
artists to show the FDC curves for this dataset (modeled and observed).
Optionally fills the space between curves.
Parameters
----------
obs : observed values
sim : simulated/modeled values
ax : Axes
The specific axes within a fig to draw upon
segment : str, optional
Which segment of the FDC do you want to highlight? ['lo', 'mid', 'hi'], by default 'mid'
fill : bool, optional
: Do you want to 'fill_between' the observed and modeled curves?. by default False
Returns
-------
Axes
The same axes we were passed in. Likely ignored.
"""
obs_sorted = np.sort(obs)[::-1]
mod_sorted = np.sort(sim)[::-1]
exceedence = np.arange(1.,len(obs_sorted)+1) / len(obs_sorted)
ax.set_xlabel("Exceedence [%]", fontsize=6)
ax.set_ylabel("Flow Rate", fontsize=6)
# NOTE: no units are specified in this label -- because we don't know enough about
# the source data. If the user wants units in the y axis labels, they can call
# ax.set_ylabel("Flow Rate $m^3 / s$") (or similar) afterwards to manipulate it.
ax.set_yscale('log')
if segment not in ['mid', 'lo', 'hi']:
logging.debug("Invalid segment identifier '%s'. Options are ['mid', 'lo', 'hi']", segment)
segment = 'mid'
if segment == 'mid':
pb=pBiasFMS(obs, sim)
ax.axvline(x = 20, color = 'b', lw=0.25)
ax.axvline(x = 70, color = 'b', lw=0.25)
ax.axvspan(20, 70, facecolor='lightgrey', alpha=0.25)
ax.text( 0.45 , 0.05, "Mid-Segment",
verticalalignment='bottom', horizontalalignment='center',
transform=ax.transAxes,
color='b', fontsize=4)
ax.text( 0.45 , 0.01, f"pBiasFMS= {pb:.4f}",
verticalalignment='bottom', horizontalalignment='center',
transform=ax.transAxes,
color='b', fontsize=3)
if segment == 'lo':
pb=pBiasFLV(obs, sim)
ax.axvline(x = 70, color = 'b', lw=0.25)
ax.axvline(x = 100, color = 'b', lw=0.25)
ax.axvspan(70, 100, facecolor='lightgrey', alpha=0.25)
ax.text( 0.99 , 0.05, "Low-Flow",
verticalalignment='bottom', horizontalalignment='right',
transform=ax.transAxes,
color='b', fontsize=4)
ax.text( 0.99 , 0.01, f"pBiasFLV= {pb:.4f}",
verticalalignment='bottom', horizontalalignment='right',
transform=ax.transAxes,
color='b', fontsize=3)
if segment == 'hi':
pb=pBiasFHV(obs, sim)
ax.axvline(x = 0, color = 'b', lw=0.25)
ax.axvline(x = 2, color = 'b', lw=0.25)
ax.axvspan(0, 2, facecolor='lightgrey', alpha=0.25)
ax.text( 0.02 , 0.05, "High-Flow",
verticalalignment='bottom', horizontalalignment='left',
transform=ax.transAxes,
color='b', fontsize=4)
ax.text( 0.02 , 0.01, f"pBiasFHV= {pb:.4f}",
verticalalignment='bottom', horizontalalignment='left',
transform=ax.transAxes,
color='b', fontsize=3)
ax.plot(exceedence*100, mod_sorted, color='r', lw=0.33, label='Modeled')
ax.plot(exceedence*100, obs_sorted, color='k', lw=0.33, label='Observed')
if fill:
ax.fill_between(exceedence*100, obs_sorted, mod_sorted, lw=0.33, color='r', hatch='//////', alpha=0.0625)
ax.legend(loc='upper right', fontsize=6, handlelength=3)
ax.grid(linewidth=0.05, dashes=[8,12], color='k', which="both", axis='y')
ax.grid(linewidth=0.1, dashes=[16, 8], which='major', axis='x')
ax.set_xlim((0, 100))
ax.tick_params(which='both', labelsize=4, width=0.25, length=2, pad=1)
_ = [ax.spines[axis].set_linewidth(0.2) for axis in ['top','bottom','left','right']]
return ax