# Explore CONUS404 Dataset
This dataset was created by extracting specified variables from a collection of wrf2d output files, rechunking to better facilitate data extraction for a variety of use cases, and adding CF conventions to allow easier analysis, visualization and data extraction using Xarray and Holoviz.

In [None]:
import os
os.environ['USE_PYGEOS'] = '0'

import fsspec
import xarray as xr
import hvplot.xarray
import intake
import metpy
import cartopy.crs as ccrs

## 1) Select the Dataset from HyTEST's Intake Catalog

In [None]:
# open the hytest data intake catalog
hytest_cat = intake.open_catalog("https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml")
list(hytest_cat)

In [None]:
# open the conus404 sub-catalog
cat = hytest_cat['conus404-catalog']
list(cat)

**Select a dataset**: If you are unsure of which copy of a particular dataset to use (e.g. `conus404-hourly-?`), please review the [HyTEST JupyterBook](https://hytest-org.github.io/hytest/dataset_catalog/README.html#storage-locations)

In [None]:
## Select the dataset you want to read into your notebook and preview its metadata
dataset = 'conus404-hourly-osn' 
cat[dataset]

## 2) Set Up AWS Credentials (Optional)

This notebook reads data from the OSN pod. The OSN pod is object store data on a high speed internet connection with free access from any computing environment. If you change this notebook to use one of the CONUS404 datasets stored on S3 (options ending in `-cloud`), you will be pulling data from a `requester-pays` S3 bucket. This means you have to set up your AWS credentials before you are able to load the data. Please note that reading the `-cloud` data from S3 may incur charges if you are reading data outside of AWS's `us-west-2` region or running the notebook outside of the cloud altogether. If you would like to access one of the `-cloud` options, uncomment and run the following code snippet to set up your AWS credentials. You can find more info about this AWS helper function [here](../environment_set_up/Help_AWS_Credentials.ipynb).

In [None]:
# uncomment the lines below to read in your AWS credentials if you want to access data from a requester-pays bucket (-cloud)
# os.environ['AWS_PROFILE'] = 'default'
# %run ../environment_set_up/Help_AWS_Credentials.ipynb

## 3) Parallelize with Dask (Optional, but recommended)
Some of the steps we will take are aware of parallel clustered compute environments
using `dask`. We will start a cluster so that future steps can take advantage
of this ability. 

This is an optional step, but speed ups data loading significantly, especially 
when accessing data from the cloud.

We have documentation on how to start a Dask Cluster in different computing environments [here](../environment_set_up/clusters.md).

In [None]:
#%run ../environment_set_up/Start_Dask_Cluster_Nebari.ipynb
## If this notebook is not being run on Nebari/ESIP, replace the above 
## path name with a helper appropriate to your compute environment.  Examples:
# %run ../environment_set_up/Start_Dask_Cluster_Denali.ipynb
# %run ../environment_set_up/Start_Dask_Cluster_Tallgrass.ipynb
# %run ../environment_set_up/Start_Dask_Cluster_Desktop.ipynb
# %run ../environment_set_up/Start_Dask_Cluster_PangeoCHS.ipynb

## 4) Explore the dataset

In [None]:
# read in the dataset and use metpy to parse the crs information on the dataset
print(f"Reading {dataset} metadata...", end='')
ds = cat[dataset].to_dask().metpy.parse_cf()
ds

In [None]:
# Examine the grid data structure for SNOW: 
ds.SNOW

Looks like this dataset is organized in three coordinates (x, y, and time), and we have used the `metpy` package to pase the crs information into the `metpy_crs` variable:

In [None]:
crs = ds['SNOW'].metpy.cartopy_crs
crs

## Example A: Load the entire spatial domain for a variable at a specific time step

In [None]:
%%time
da = ds.SNOW_ACC_NC.sel(time='2009-12-24 00:00').load()
### NOTE: the `load()` is dask-aware, so will operate in parallel if
### a cluster has been started. 

In [None]:
da.hvplot.quadmesh(x='lon', y='lat', rasterize=True, geo=True, tiles='OSM', cmap='viridis').opts('Image', alpha=0.5)

## Example B: Load a time series for a variable at a specific grid cell for a specified time range

We will identify a point that we want to pull data for using lat/lon coordinates.

The CONUS404 data is in a Lambert Conformal Conic projection, so we need to re-project/transform using the
built-in `crs` we examined earlier.

In [None]:
lat,lon = 39.978322,-105.2772194    
x, y = crs.transform_point(lon, lat, src_crs=ccrs.PlateCarree())   
print(x,y) # these vals are in LCC

In [None]:
%%time
# pull out a particulat time slice at the specified coordinates
da = ds.PREC_ACC_NC.sel(x=x, y=y, method='nearest').sel(time=slice('2013-01-01 00:00','2013-12-31 00:00')).load()

In [None]:
# plot your time series
da.hvplot(x='time', grid=True)

## Stop cluster
Uncomment the line below if you started a dask cluster to shut it down.

In [None]:
#client.close(); cluster.shutdown()