Chunking Data

Chunking Data#

Chunking data is an important foundational step in preparing data for HyTEST workflows. It helps massive datasets be easily subsetted (without having to download the whole dataset), and it supports efficient parallelized workflows. Organizing and chunking the data appropriately will dramatically affect the performance of analyses.

This section briefly steps through:

What is rechunking?
Why we would want to rechunk the data?
A real-world example
A bigger real-world example

This will give you an introduction to the data chunking process. We plan to update this section with a much more detailed tutorial in the future.