# CONUS404 Temporal Aggregation
Create daily averages from hourly data, write to a zarr dataset


In [None]:
import fsspec
import xarray as xr
import hvplot.xarray
import intake
import os
import warnings
from dask.distributed import LocalCluster, Client
warnings.filterwarnings('ignore')

## Open dataset from Intake Catalog
* Select `on-prem` dataset from /caldera if running on prem (Denali/Tallgrass)
* Select `cloud`/`osn` object store data if running elsewhere

In [None]:
# open the hytest data intake catalog
hytest_cat = intake.open_catalog("https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml")
list(hytest_cat)

In [None]:
# open the conus404 sub-catalog
cat = hytest_cat['conus404-catalog']
list(cat)

In [None]:
## Select the dataset you want to read into your notebook and preview its metadata
dataset = 'conus404-hourly-osn' 
cat[dataset]

## 2) Set Up AWS Credentials (Optional)

This notebook reads data from the OSN pod by default, which is object store data on a high speed internet connection that is free to access from any environment. If you change this notebook to use one of the CONUS404 datasets stored on S3 (options ending in `-cloud`), you will be pulling data from a `requester-pays` S3 bucket. This means you have to set up your AWS credentials, else we won't be able to load the data. Please note that reading the `-cloud` data from S3 may incur charges if you are reading data outside of the us-west-2 region or running the notebook outside of the cloud altogether. If you would like to access one of the `-cloud` options, uncomment and run the following code snippet to set up your AWS credentials. You can find more info about this AWS helper function [here](https://hytest-org.github.io/hytest/environment_set_up/Help_AWS_Credentials.html).

In [None]:
# uncomment the lines below to read in your AWS credentials if you want to access data from a requester-pays bucket (-cloud)
# os.environ['AWS_PROFILE'] = 'default'
# %run ../environment_set_up/Help_AWS_Credentials.ipynb

## Parallelize with Dask 
Some of the steps we will take are aware of parallel clustered compute environments
using `dask`. We're going to start a cluster now so that future steps can take advantage
of this ability. 

This is an optional step, but speed ups data loading significantly, especially 
when accessing data from the cloud.

We have documentation on how to start a Dask Cluster in different computing environments [here](../environment_set_up/clusters.md).

In [None]:
%run ../environment_set_up/Start_Dask_Cluster_Nebari.ipynb
## If this notebook is not being run on Nebari/ESIP, replace the above 
## path name with a helper appropriate to your compute environment.  Examples:
# %run ../environment_set_up/Start_Dask_Cluster_Denali.ipynb
# %run ../environment_set_up/Start_Dask_Cluster_Tallgrass.ipynb
# %run ../environment_set_up/Start_Dask_Cluster_Desktop.ipynb
# %run ../environment_set_up/Start_Dask_Cluster_PangeoCHS.ipynb

## Explore the dataset

In [None]:
ds = cat[dataset].to_dask()

In [None]:
ds

In [None]:
ds.T2

## Daily averages
Time averages of any type are easy to do with xarray.   Here we do 24 hour averages, and set the time offset to 12 hours, so that the time values are in the middle of the averaging period.   

Digital Earth Africa has a great [Working with Time in Xarray](https://docs.digitalearthafrica.org/fr/latest/sandbox/notebooks/Frequently_used_code/Working_with_time.html) tutorial.

In the example below we just do a few days with a few variables as a quick demo.   

In [None]:
%%time
ds_subset = ds[['T2','U10']].sel(time=slice('2017-01-02','2017-01-13'))

In [None]:
ds_subset_daily = ds_subset.resample(time="24H", offset="12h", label='right').mean()

In [None]:
ds_subset_daily

In [None]:
ds_subset_daily.hvplot.quadmesh(x='lon', y='lat', rasterize=True, 
                             geo=True, tiles='OSM', alpha=0.7, cmap='turbo')

### Write daily values as a Zarr dataset (to onprem or cloud)
You will need to to turn the following cell from `raw` to `code` and update the filepaths in order to save out your data.

## Shutdown cluster

In [None]:
client.close(); cluster.shutdown()