{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# D-Score Suite (v1) Benchmark -- Usage Examples\n", "\n", "\n", ":::{note}\n", "_This notebook adapted from originals by Timothy Hodson and Rich Signell. See that upstream work at:_\n", "* https://github.com/thodson-usgs/dscore\n", "* https://github.com/USGS-python/hytest-evaluation-workflows/\n", ":::\n", "\n", "This notebook will demonstrate how to call the specific functions defined in the D-Score Suite\n", "notebook, using a small demonstration dataset." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sample Data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "12145 Records\n" ] } ], "source": [ "sampleData = pd.read_csv(r\"../nwm_streamflow/NWM_Benchmark_SampleData.csv\", index_col='date', parse_dates=True).dropna()\n", "print(len(sampleData.index), \" Records\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A quick look at the table shows that this data contains time-series streamflow values for\n", "observed ('obs'), the NWM data model ('nwm'), and the NHM model ('nhm'). This demonstration\n", "dataset limits to a single gage (\"`site_no` = 1104200\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
site_noobsnwmnhm
date
1983-10-0111042001.1213476.1754171.469472
1983-10-0211042001.2147936.2504171.848861
1983-10-0311042000.8721596.2158332.169456
1983-10-0411042000.4190896.1050002.200083
1983-10-0511042000.8495055.9525001.931588
\n", "
" ], "text/plain": [ " site_no obs nwm nhm\n", "date \n", "1983-10-01 1104200 1.121347 6.175417 1.469472\n", "1983-10-02 1104200 1.214793 6.250417 1.848861\n", "1983-10-03 1104200 0.872159 6.215833 2.169456\n", "1983-10-04 1104200 0.419089 6.105000 2.200083\n", "1983-10-05 1104200 0.849505 5.952500 1.931588" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sampleData.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import Benchmark Functions\n", "The metric functions are defined and described in\n", "{doc}`/evaluation/Metrics_DScore_Suite_v1`.\n", "They are imported here by running that notebook from within the following cell:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "%run ../../Metrics_DScore_Suite_v1.ipynb\n", "# This defines the same functions in this notebook's namespace." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The functions are now available here, to run against our sample data:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "55.73589185136414" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Mean Square Error\n", "mse(sampleData['obs'], sampleData['nwm'])" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "winter 13.205368\n", "spring 11.135375\n", "summer 14.120221\n", "fall 17.274927\n", "dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "seasonal_mse(sampleData['obs'], sampleData['nwm'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Composite Benchmark\n", "It is useful to combine several of these metrics into a single benchmark routine, which returns a pandas Series of the assembled metrics.\n", "\n", "This 'wrapper' composite benchmark also handles any transforms of the data before calling the metric functions. In this case, we will log transform the data. " ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def compute_benchmark(df):\n", " \"\"\"\n", " Runs several metrics against the data table in 'df'. \n", "\n", " NOTE: the 'obs' and 'nwm' columns must exist in df, and that nan's have already been removed. \n", " \"\"\"\n", " obs = np.log(df['obs'].clip(lower=0.01)) # clip to remove zeros and negative values\n", " sim = np.log(df['nwm'].clip(lower=0.01)) \n", " \n", " mse_ = pd.Series(\n", " [ mse(obs, sim) ], \n", " index=[\"mse\"], \n", " dtype='float32'\n", " )\n", " return pd.concat([\n", " mse_,\n", " bias_distribution_sequence(obs, sim), \n", " seasonal_mse(obs, sim),\n", " quantile_mse(obs, sim)\n", " ],\n", " )" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "mse 0.874842\n", "e_bias 0.409683\n", "e_dist 0.224187\n", "e_seq 0.241010\n", "winter 0.057879\n", "spring 0.033822\n", "summer 0.396487\n", "fall 0.386654\n", "low 0.653889\n", "below_avg 0.127766\n", "above_avg 0.052214\n", "high 0.040973\n", "dtype: float64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "compute_benchmark(sampleData)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Score-Cards\n", "The DScore functions include an ILAMB-style scorecard function to produce a graphic scorecard from these metrics.\n", "Note that a scorecard such as this is typically applied to a composite of DScore metrics computed for many gages.\n", "This demos the scorecard for a single gage **as if** it were the mean of all gages in an evaluation analysis. " ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NWM
mse100
e_bias47
e_dist26
e_seq28
winter7
spring4
summer45
fall44
low75
below_avg15
above_avg6
high5
\n", "
" ], "text/plain": [ " NWM\n", "mse 100\n", "e_bias 47\n", "e_dist 26\n", "e_seq 28\n", "winter 7\n", "spring 4\n", "summer 45\n", "fall 44\n", "low 75\n", "below_avg 15\n", "above_avg 6\n", "high 5" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Compute benchmark and 'score' each decomp as percent of total MSE\n", "bm = compute_benchmark(sampleData)\n", "percentage_card = pd.DataFrame(data={\n", " 'NWM' : ((bm / bm['mse']) * 100).round().astype(int)\n", " })\n", "percentage_card.name=\"Percent\" ## NOTE: `name` is a non-standard attribute for a dataframe. We use it to stash\n", " ## metadata for this dataframe which the ilamb_card_II() func will use to label things.\n", "percentage_card" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "n_cards=1\n", "fig, ax = plt.subplots(1, n_cards, figsize=(0.5+(1.5*n_cards), 3.25), dpi=150)\n", "ax = ilamb_card_II(percentage_card, ax)\n", "plt.show()\n" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "## if the score card has columns for multilple models..... \n", "# fictitious example:\n", "percentage_card['XYZ'] = pd.Series([100, 20, 30, 20, 10, 50, 60, 70, 20, 10, 40, 50], index=percentage_card.index)\n", "fig, ax = plt.subplots(1, n_cards, figsize=(0.5+(1.5*n_cards), 3.25), dpi=150)\n", "ax = ilamb_card_II(percentage_card, ax)\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" }, "vscode": { "interpreter": { "hash": "d7ebce313f85fb1ac8949e834c83f371584cb2422d845bf1570c1220fdedc716" } } }, "nbformat": 4, "nbformat_minor": 4 }