Explore CONUS404 Dataset#

This dataset was created by extracting specified variables from a collection of wrf2d output files, rechunking to better facilitate data extraction for a variety of use cases, and adding CF conventions to allow easier analysis, visualization and data extraction using Xarray and Holoviz.

import os
os.environ['USE_PYGEOS'] = '0'

import fsspec
import xarray as xr
import hvplot.xarray
import zarr
import pystac
from packaging.version import Version
import metpy
import cartopy.crs as ccrs

1) Select the Dataset from the WMA STAC Catalog#

def get_children(catalog, collection_id=None):
    """
    This function retrieves a specified collection from a STAC catalog/collection and prints key metadata 
    for exploring/accessing the datasets contained within it.
    If there is no collection ID provided, the collections in the top level of the catalog will be printed.
    If a collection ID is provided, it will retrieve the collection with that ID from the input catalog/collection.
    If the collection ID points to a dataset, it will print the assets available for the dataset.
    If the collection ID points to another collection, it will list the child collections in the IDed collection.

    Args:
        catalog (pystac.Catalog | pystac.Collection): The STAC catalog/collection object.
        collection_id (str): The ID of the collection or dataset to retrieve from catalog.
    
    Returns:
        collection (pystac.Catalog | pystac.Collection): The collection object corresponding to the provided ID
                                                         or the top-level catalog if no ID is provided.
    """
    dataset = False
    if collection_id:
        collection = catalog.get_child(collection_id)
        if collection.assets:
            dataset = True
            print(f"{collection_id} is a dataset. Please review the assets below and select one to open.")

        else:
            print(f"{collection_id} is a collection. Please review the child items and select one to open in the next cell.")
    else:
        collection = catalog
    if dataset==True:
        # List the assets
        for asset in collection.assets:
            print(f"Asset ID: {asset}")
            print(f"    Title: {collection.assets[asset].title}")
            print(f"    Description: {collection.assets[asset].description}")
    else:
        collections = list(collection.get_collections())
        print(f"Number of collections: {len(collections)}")
        print("Collections IDs:")
        for child_collection in collections:
            id = child_collection.id
            cite_as = "Not available"
            for link in child_collection.links:
                if link.rel == "cite-as":
                    cite_as = link.target
            print(f"- {id}, Source: {cite_as}")
    return collection
# url for the WMA STAC Catalog
catalog_url = "https://api.water.usgs.gov/gdp/pygeoapi/stac/stac-collection/"

# use pystac to read the catalog
catalog = pystac.Catalog.from_file(catalog_url)

# list the collections in the catalog
catalog = get_children(catalog)
Number of collections: 43
Collections IDs:
- AIEM_permafrost, Source: Not available
- CA-BCM-2014, Source: https://doi.org/10.21429/dye8-h568
- FLET, Source: Not available
- GMO, Source: https://doi.org/10.21429/v7ys-6n72
- GMO_New, Source: https://doi.org/10.21429/c6s4-ve70
- LOCA2, Source: Not available
- PRISM_v2, Source: https://prism.oregonstate.edu/
- PuertoRico, Source: Not available
- RedRiver, Source: Not available
- SPEI, Source: Not available
- TTU_2019, Source: Not available
- TopoWx2017, Source: Not available
- WUS_HSP, Source: Not available
- alaska_et_2020, Source: Not available
- bcca, Source: Not available
- bcsd_mon_vic, Source: http://gdo-dcp.ucllnl.org/downscaled_cmip_projections/
- bcsd_obs, Source: http://gdo-dcp.ucllnl.org/downscaled_cmip_projections/
- cmip5_bcsd, Source: https://doi.org/10.21429/sbxv-1n90
- conus404-pgw, Source: Not available
- conus404, Source: Not available
- cooper, Source: Not available
- cprep, Source: Not available
- dcp_compressed, Source: https://doi.org/https%3A//doi.org/10.21429/j9f1-b218
- gridMET, Source: https://www.climatologylab.org/gridmet.html
- hawaii_2018, Source: Not available
- iclus, Source: Not available
- loca, Source: Not available
- maca-vic, Source: Not available
- macav2, Source: Not available
- maurer, Source: https://doi.org/10.21429/m7y0-xy02
- mows, Source: Not available
- nc-casc-snow, Source: Not available
- nlcd, Source: Not available
- notaro_2018, Source: Not available
- pacis, Source: Not available
- puerto_rico, Source: Not available
- red_river_2018, Source: https://doi.org/10.21429/em59-hn43
- serap, Source: Not available
- slr2d, Source: https://doi.org/10.21429/66gt-dm26
- snodas, Source: https://doi.org/10.7265/N5TB14TC
- ssebopeta, Source: Not available
- stageiv_combined, Source: https://pubs.usgs.gov/fs/2013/3035/
- wicci, Source: https://doi.org/10.21429/dtp5-z505
# select a collection from the catalog, replace the collection ID with the one you want to use:
collection = get_children(catalog, collection_id="conus404")
conus404 is a collection. Please review the child items and select one to open in the next cell.
Number of collections: 4
Collections IDs:
- conus404_daily, Source: https://doi.org/10.5066/P9PHPK4F
- conus404_hourly, Source: https://doi.org/10.5066/P9PHPK4F
- conus404_monthly, Source: https://doi.org/10.5066/P9PHPK4F
- conus404_xtrm_daily, Source: https://doi.org/10.5066/P9PHPK4F
# select a collection from the catalog, replace the collection ID with the one you want to use:
collection = get_children(collection, collection_id="conus404_hourly")
conus404_hourly is a dataset. Please review the assets below and select one to open.
Asset ID: zarr-s3-osn
    Title: Free access to zarr via S3 API
    Description: Free, public access to zarr data store via the S3 API. This data is stored on an Open Storage Network Pod.
Asset ID: zarr-disk-hovenweep
    Title: USGS internal access to zarr via Hovenweep
    Description: USGS internal access to zarr data store via disk storage attached to the Hovenweep supercomputer.
Asset ID: default
    Title: None
    Description: None
# replace with the asset ID you want to use:
selected_asset_id = "zarr-s3-osn"

# read the asset metadata
asset = collection.assets[selected_asset_id]

2) Set Up AWS Credentials (Optional)#

This notebook reads data from the OSN pod. The OSN pod is object store data on a high speed internet connection with free access from any computing environment. If you change this notebook to use one of the CONUS404 datasets stored on S3 (options ending in -cloud), you will be pulling data from a requester-pays S3 bucket. This means you have to set up your AWS credentials before you are able to load the data. Please note that reading the -cloud data from S3 may incur charges if you are reading data outside of AWS’s us-west-2 region or running the notebook outside of the cloud altogether. If you would like to access one of the -cloud options, uncomment and run the following code snippet to set up your AWS credentials. You can find more info about this AWS helper function here.

# uncomment the lines below to read in your AWS credentials if you want to access data from a requester-pays bucket (-cloud)
# os.environ['AWS_PROFILE'] = 'default'
# %run ../environment_set_up/Help_AWS_Credentials.ipynb

4) Explore the dataset#

# read in the dataset and use metpy to parse the crs information on the dataset
if Version(zarr.__version__) < Version("3.0.0"):
    ds = xr.open_dataset(
        asset.href,
        storage_options=asset.extra_fields['xarray:storage_options'],
        **asset.extra_fields['xarray:open_kwargs']
    )
else:
    ds = xr.open_dataset(
    asset.href,
    storage_options=asset.extra_fields['xarray:storage_options'],
    **asset.extra_fields['xarray:open_kwargs'],
    zarr_format=2
    )
ds = ds.metpy.parse_cf()
ds
<xarray.Dataset> Size: 222TB
Dimensions:         (y: 1015, x: 1367, time: 376945, bottom_top_stag: 51,
                     bottom_top: 50, soil_layers_stag: 4, x_stag: 1368,
                     y_stag: 1016, snow_layers_stag: 3, snso_layers_stag: 7)
Coordinates:
    lat             (y, x) float32 6MB dask.array<chunksize=(175, 175), meta=np.ndarray>
    lon             (y, x) float32 6MB dask.array<chunksize=(175, 175), meta=np.ndarray>
  * time            (time) datetime64[ns] 3MB 1979-10-01 ... 2022-10-01
  * x               (x) float64 11kB -2.732e+06 -2.728e+06 ... 2.732e+06
  * y               (y) float64 8kB -2.028e+06 -2.024e+06 ... 2.028e+06
    metpy_crs       object 8B Projection: lambert_conformal_conic
    lat_u           (y, x_stag) float32 6MB dask.array<chunksize=(175, 175), meta=np.ndarray>
    lon_u           (y, x_stag) float32 6MB dask.array<chunksize=(175, 175), meta=np.ndarray>
    lat_v           (y_stag, x) float32 6MB dask.array<chunksize=(175, 175), meta=np.ndarray>
    lon_v           (y_stag, x) float32 6MB dask.array<chunksize=(175, 175), meta=np.ndarray>
Dimensions without coordinates: bottom_top_stag, bottom_top, soil_layers_stag,
                                x_stag, y_stag, snow_layers_stag,
                                snso_layers_stag
Data variables: (12/153)
    ACDEWC          (time, y, x) float32 2TB dask.array<chunksize=(144, 175, 175), meta=np.ndarray>
    ACDRIPR         (time, y, x) float32 2TB dask.array<chunksize=(144, 175, 175), meta=np.ndarray>
    ACDRIPS         (time, y, x) float32 2TB dask.array<chunksize=(144, 175, 175), meta=np.ndarray>
    ACECAN          (time, y, x) float32 2TB dask.array<chunksize=(144, 175, 175), meta=np.ndarray>
    ACEDIR          (time, y, x) float32 2TB dask.array<chunksize=(144, 175, 175), meta=np.ndarray>
    ACETLSM         (time, y, x) float32 2TB dask.array<chunksize=(144, 175, 175), meta=np.ndarray>
    ...              ...
    ZNU             (bottom_top) float32 200B dask.array<chunksize=(50,), meta=np.ndarray>
    ZNW             (bottom_top_stag) float32 204B dask.array<chunksize=(51,), meta=np.ndarray>
    ZS              (soil_layers_stag) float32 16B dask.array<chunksize=(4,), meta=np.ndarray>
    ZSNSO           (time, snso_layers_stag, y, x) float32 15TB dask.array<chunksize=(144, 7, 175, 175), meta=np.ndarray>
    ZWT             (time, y, x) float32 2TB dask.array<chunksize=(144, 175, 175), meta=np.ndarray>
    crs             int64 8B ...
Attributes: (12/148)
    AER_ANGEXP_OPT:                  1
    AER_ANGEXP_VAL:                  1.2999999523162842
    AER_AOD550_OPT:                  1
    AER_AOD550_VAL:                  0.11999999731779099
    AER_ASY_OPT:                     1
    AER_ASY_VAL:                     0.8999999761581421
    ...                              ...
    WEST-EAST_PATCH_START_STAG:      1
    WEST-EAST_PATCH_START_UNSTAG:    1
    W_DAMPING:                       1
    YSU_TOPDOWN_PBLMIX:              0
    history:                         Tue Mar 29 16:35:22 2022: ncrcat -A -vW ...
    history_of_appended_files:       Tue Mar 29 16:35:22 2022: Appended file ...
# Examine the grid data structure for SNOW: 
ds.SNOW
<xarray.DataArray 'SNOW' (time: 376945, y: 1015, x: 1367)> Size: 2TB
dask.array<open_dataset-SNOW, shape=(376945, 1015, 1367), dtype=float32, chunksize=(144, 175, 175), chunktype=numpy.ndarray>
Coordinates:
    lat        (y, x) float32 6MB dask.array<chunksize=(175, 175), meta=np.ndarray>
    lon        (y, x) float32 6MB dask.array<chunksize=(175, 175), meta=np.ndarray>
  * time       (time) datetime64[ns] 3MB 1979-10-01 ... 2022-10-01
  * x          (x) float64 11kB -2.732e+06 -2.728e+06 ... 2.728e+06 2.732e+06
  * y          (y) float64 8kB -2.028e+06 -2.024e+06 ... 2.024e+06 2.028e+06
    metpy_crs  object 8B Projection: lambert_conformal_conic
Attributes:
    description:   SNOW WATER EQUIVALENT
    grid_mapping:  crs
    long_name:     Snow water equivalent
    units:         kg m-2

Looks like this dataset is organized in three coordinates (x, y, and time), and we have used the metpy package to pase the crs information into the metpy_crs variable:

crs = ds['SNOW'].metpy.cartopy_crs
crs
2025-12-09T20:13:42.394373 image/svg+xml Matplotlib v3.10.5, https://matplotlib.org/
<cartopy.crs.LambertConformal object at 0x7f3725d0c910>

Example A: Load the entire spatial domain for a variable at a specific time step#

%%time
da = ds.SNOW_ACC_NC.sel(time='2009-12-24 00:00').load()
### NOTE: the `load()` is dask-aware, so will operate in parallel if
### a cluster has been started. 
CPU times: user 4.6 s, sys: 460 ms, total: 5.06 s
Wall time: 22.8 s
da.hvplot.quadmesh(x='lon', y='lat', rasterize=True, geo=True, tiles='OSM', cmap='viridis').opts('Image', alpha=0.5)

Example B: Load a time series for a variable at a specific grid cell for a specified time range#

We will identify a point that we want to pull data for using lat/lon coordinates.

The CONUS404 data is in a Lambert Conformal Conic projection, so we need to re-project/transform using the built-in crs we examined earlier.

lat,lon = 39.978322,-105.2772194    
x, y = crs.transform_point(lon, lat, src_crs=ccrs.PlateCarree())   
print(x,y) # these vals are in LCC
-618215.7570892773 121899.89692719541
%%time
# pull out a particulat time slice at the specified coordinates
da = ds.PREC_ACC_NC.sel(x=x, y=y, method='nearest').sel(time=slice('2013-01-01 00:00','2013-12-31 00:00')).load()
CPU times: user 1.78 s, sys: 262 ms, total: 2.04 s
Wall time: 8.65 s
# plot your time series
da.hvplot(x='time', grid=True)

Stop cluster#

Uncomment the line below if you started a dask cluster to shut it down.

#client.close(); cluster.shutdown()