D-Score Suite (v1) Benchmark#
This notebook implements the D-Score metrics from Hodson et al., 2021, which decompose mean squared error into orthogonal metrics like bias and variance. This notebook briefly describes each decomposition along with providing Python implementations. This notebook can be sourced into analysis notebooks to retrieve access to these functions. See D-Score Usage for an example of using these functions with a demonstration dataset.
Decomposition Name |
Metric |
Description |
---|---|---|
Top-level Error Metric |
mse |
Mean Squared Error; error metric that is decomposed into sets of decompositions below. |
Bias-Variance |
e_bias (Bias) |
Bias corresponds to error in the expected magnitude |
Bias-Variance |
e_variance (Variance) |
Variance corresponds to errors in distribution and timing |
Bias, Distribution, Sequence |
e_bias (Bias) |
Bias corresponds to error in magnitudes |
Bias, Distribution, Sequence |
e_dist (Distribution) |
Distribution corresponds to distributional error independent of timing |
Bias, Distribution, Sequence |
e_seq (Sequence) |
Sequence related to error in ordering of values, timing |
Trend, Seasonality, Residual Variability |
trend |
Error related to overall trend |
Trend, Seasonality, Residual Variability |
seasonality |
Error related to shifts in seasonality such as phase and amplitude variation |
Trend, Seasonality, Residual Variability |
residual |
Remainder error, not attributed to anything in particular but still exists |
Seasonal |
winter |
Northern Hemisphere Seasons: Dec, Jan, Feb |
Seasonal |
spring |
Northern Hemisphere Seasons: Mar, Apr, May |
Seasonal |
summer |
Northern Hemisphere Seasons: Jun, Jul, Aug |
Seasonal |
fall |
Northern Hemisphere Seasons: Sep, Oct, Nov |
Percentile |
low |
5th percentile and lower |
Percentile |
below_avg |
>25th percentile and <= 50th percentile |
Percentile |
above_avg |
>50th percentile and <= 75th percentile |
Percentile |
high |
>75th percentile |
Source: These statistics are adapted from the original functions found in thodson-usgs/dscore
Hodson et al., 2021, Mean Squared Error, Deconstructed. Journal of Advances in Modeling Earth Systems, 13(12), https://doi.org/10.1029/2021MS002681.
Import libraries#
import logging
import numpy as np
import pandas as pd
Mean Squared Error#
def mse(obs, sim) -> float:
"""
Mean Square Error -- Compute MSE over all paired values observed (x) and simulated/modeled (x_hat)
.. math::
\\sum_{i=1}^{n}(x_i - \\hat{x}_i)^2
:return: mean square error
:rtype: float
NOTE: this and all functions below rely upon the obs and sim datatypes implementing
certain math methods on themselves. That is, obs.sum() must be defined by
typeof(obs). Pandas Series and DataFrames do this, but other array_like
may not.
"""
e = obs - sim
return (e**2).mean()
Percent Bias#
def pbias(obs, sim) -> float:
"""
percent bias -- a measure of the mean tendency of simulated values to be
greater than or less than associated observed values.
Returns:
float: calculated percent bias / units = percent (i.e. 90 rather than 0.90)
"""
return 100 * (sim - obs).sum() / obs.sum()
Bias-Variance Decomposition#
From ‘Bias-Variance Trade-off’. See Geman et al., 1992 for more details.
Geman et al., 1992, Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 158. http://dx.doi.org/10.1162/neco.1992.4.1.1
def e_bias(obs, sim) -> float:
"""
Bias = square of mean error
Returns:
_type_: _description_
"""
e = sim - obs
return e.mean()**2
def e_variance(obs, sim) -> float:
"""
Variance of error
Args:
obs (pd.Series - like): data representing observed values
sim (pd.Series - like): data representing simulated/modeled values
Returns:
float: variance of the error
"""
e = sim - obs
return e.var(ddof=0) # use the maximum likelihood estimator for the population variance; 1/n * RSS
Bias-Distribution-Sequence Decomposition#
Hodson et al., 2021, Mean Squared Error, Deconstructed. Journal of Advances in Modeling Earth Systems, 13(12), https://doi.org/10.1029/2021MS002681.
def bias_distribution_sequence(obs, sim) -> float:
"""
Decomposition into bias, distribution, sequence
Args:
obs (pd.Series - like): data representing observed values
sim (pd.Series - like): data representing simulated/modeled values
Returns:
pd.Series: three metrics, indexed by name
"""
e = sim - obs
s = np.sort(sim) - np.sort(obs)
var_s = s.var(ddof=0) # use the maximum likelihood estimator for the population variance; 1/n * RSS
var_e = e.var(ddof=0) # use the maximum likelihood estimator for the population variance; 1/n * RSS
e_seq = var_e - var_s
e_dist = var_s
e_bias = e.mean()**2
return pd.Series(
[e_bias, e_dist, e_seq],
index=["e_bias", "e_dist", "e_seq"])
Trend, Seasonality, Residual Variability Decomposition#
The trend, seasonality, and residual variability components are not exactly orthogonal, thus the total percent error for this decomposition will sum to less than 100%. See Cleveland et al., 1990 and Hodson et al., 2021 for more information.
Cleveland et al., 1990, STL: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 6(1), 3-73.
Hodson et al., 2021, Mean Squared Error, Deconstructed. Journal of Advances in Modeling Earth Systems, 13(12), https://doi.org/10.1029/2021MS002681.
try:
from statsmodels.tsa.seasonal import STL
_SEASONAL = True
except ImportError:
logging.debug("STL library not available.")
_SEASONAL = False
def stl(obs, sim):
"""
Decompose error using STL.
Seasonal and trend decomposition using Loess (STL).
Note that STL is not perfectly orthogonal.
References
----------
.. [1] Cleveland et al., 1990, STL: A seasonal-trend decomposition
procedure based on loess. Journal of Official Statistics, 6(1), 3-73.
"""
if not _SEASONAL:
logging.warning("STL statistics not available.")
return None
e = sim - obs
res = STL(e, period=365, seasonal=9).fit()
E = pd.DataFrame(
{
'trend': res.trend,
'seasonality': res.seasonal,
'residual': res.resid
}
)
return (E**2).mean()
Seasonal Decomposition#
Seasonal decomposition uses Northern Hemisphere seasons as a basis, this can be customized below in the function for different seasonal divisions.
def seasonal_mse(obs, sim):
"""
Decompose error by season.
Args:
obs (pd.Series - like): data representing observed values
sim (pd.Series - like): data representing simulated/modeled values
Both obs and sim should be time-indexed, such that we can pick out months
from the time value.
Returns:
pd.Series : mse for 4 major seasons
NOTE: 'season' is viewed from a northern-hemisphere perspective
"""
e = sim - obs
idx = (e.index.month == 12) | (e.index.month <= 2) #Dec, Jan, Feb
winter = ((e*idx)**2).mean()
idx = (e.index.month > 2) & (e.index.month <= 5) #Mar, Apr, May
spring = ((e*idx)**2).mean()
idx = (e.index.month > 5) & (e.index.month <= 8) #Jun, Jul, Aug
summer = ((e*idx)**2).mean()
idx = (e.index.month > 8) & (e.index.month <= 11) #Sep, Oct, Nov
fall = ((e*idx)**2).mean()
return pd.Series([winter, spring, summer, fall], index=['winter', 'spring', 'summer', 'fall'])
Quantile Decomposition#
MSE is decomposed by quantile range:
Q from |
Q to |
Label |
---|---|---|
0.00 |
0.25 |
Low |
0.25 |
0.50 |
Below-Average |
0.50 |
0.75 |
Above-Average |
0.75 |
1.00 |
High |
def quantile_mse(obs, sim):
"""
Decomposes MSE by quantile rangess 0.00-0.25; 0.25-0.5; 0.5-0.75; 0.75-1.00
Args:
obs (pd.Series - like ): series of observed values
sim (_type_): series of simulated/modeled values
Both share a common index
Returns:
pd.Series : decomposed MSE, one value per quantile range
"""
breaks=[0, 0.25, 0.5, 0.75, 1]
labels=['low', 'below_avg', 'above_avg', 'high']
e = sim - obs
scores = []
ranks = obs.rank(method='first')
quants = pd.qcut(ranks, q=breaks)
for i in range(len(breaks) - 1):
quant = e * (quants == quants.cat.categories[i]) # select quantile
mse_q = ((quant)**2).mean()
scores.append(mse_q)
return pd.Series(scores, index=labels)
Scoring Function - ILAMB Scoring#
Collier et al., 2018, The International Land Model Benchmarking (ILAMB) system: Design, theory, and implementation. Journal of Advances in Modeling Earth Systems, 10(11), http://dx.doi.org/10.1029/2018ms001354
def score(e, a=1.0):
"""
Scores an error
Exponential scoring function that maps MSE to the unit interval.
Parameters
----------
a : float
Positive tuning parameter.
References
----------
.. [1] Collier et al., 2018, The International Land Model Benchmarking
(ILAMB) system: Design, theory, and implementation. Journal of Advances
in Modeling Earth Systems, 10(11), http://dx.doi.org/10.1029/2018ms001354
"""
if a <= 0.0:
raise ValueError("Tuning parameter must be a positive float")
return np.exp(-1 * a * e)
ScoreCard Visualizations#
These routines help with formatting scorecards.
import matplotlib.pyplot as plt
def ilamb_card_II(df, ax=None, clim=(0,100), cmap='RdYlBu'):
axs = ax or plt.gca()
try:
axs.set_xlabel(df.name) # note that name is not a standard dataframe attribute.
except NameError:
pass # Don't set xaxis label name
axs.set_ylabel("Component")
axs.set_yticks(np.arange(len(df.index)), labels=df.index, fontsize=6)
axs.yaxis.set_tick_params(length=0, which='minor')
axs.set_xticks(np.arange(len(df.columns)), labels=df.columns, fontsize=6, rotation=90, ha='center', va='bottom')
axs.xaxis.tick_top()
axs.xaxis.set_tick_params(length=0)
axs.xaxis.set_label_position('top')
# Thick black lines between decompositions
axs.axhline(y=0.5, color='k') #betwen cells 0 and 1
axs.axhline(y=3.5, color='k') # between 3 and 4
axs.axhline(y=6.5, color='k') # between 6 and 7
axs.axhline(y=10.5, color='k') # between 9 and 10
im = axs.imshow(df, cmap=cmap, vmin=clim[0], vmax=clim[1], alpha=0.80)
cbar = axs.figure.colorbar(im, ax=axs, location='bottom', ticks=[0, 50, 100], ticklocation='bottom', pad=0.05, fraction=0.15, shrink=0.5)
cbar.ax.tick_params(labelsize=4, width=0.2, length=2, pad=1, labelbottom=True)
cbar.outline.set_linewidth(0.2)
## Annotates each cell...
txtattrs = dict(ha="center", va="center", fontsize=5)
i=0
for col in df.columns:
j=0
for row in df.index:
text = im.axes.text(i, j, df[col][row], **txtattrs)
j+=1
i+=1
## Offset minor gridlines.... paint them white.
axs.set_yticks(np.arange(len(df.index)+1)-.5, minor=True)
axs.set_xticks(np.arange(len(df.columns)+1)-.5, minor=True)
axs.grid(which="minor", color="w", linestyle='-', linewidth=0.75)
return axs