If so, then consider intake-xarray and intake-esm.
Defining and loading data-sets costs time and effort. The data scientist needs to know what data are available, and the characteristics of each data-set, before going to the effort of loading and beginning to analyze some specific data-set. Furthermore, they might need to learn the API of some Python package specific to the target format. The code to do such data loading often makes up the first block of every notebook or script, propagated by copy&paste.
Intake has been designed as a simple layer over other Python libraries to:
Source and further reading: https://www.anaconda.com/intake-taking-the-pain-out-of-data-access/
intake-xarray
combines intake
with xarray
. You can easily access data from various locations and filenames you/someone predefined in a YAML file.
import intake cat = intake.open_catalog("/home/mpim/m300524/pymistral/intake/obs.yml") ds = cat.['HadCRUT3'].to_dask()
Clone https://gitlab.dkrz.de/m300524/pymistral and install the conda environment pymistral to try out the notebooks yourself.
intake-esm
combines intake-xarray
with pandas to make Earth-System-Model output easily accessible. A builder creates a collection
, which is pandas.Dataframe
from a catalog
, which is a json
file. Luckily, a few collection
s are available for mistral. These collections
s can be searched with queries and directly load ESM output via dask
into xarray
. Developed at NCAR.
Also possible with other common experiment comparisons: Choose from CMIP5, CMIP6, MiKlip or MPI GE, see /work/ik1017/Catalogs
.
import intake col_url = "/work/ik1017/Catalogs/mistral-cmip6.json" col = intake.open_esm_datastore(col_url) query = dict(experiment_id='esm-piControl', table_id='Omon', variable_id='fgco2', grid_label=['gn', 'gr']) cat = col.search(**query) dset_dict = cat.to_dataset_dict(cdf_kwargs={'chunks': {'time': 12*50}}) ds = dset_dict['CMIP.CCCma.CanESM5.esm-piControl.Omon.gn']
Do you like the capabilities of intake
? Consider writing your own yaml
files and share them with your peers.
A collection of ideas how to use intake-esm
for your own experiments:
json
file mistral-MPI-GE.json
from intake-esm-datastore
and modify it so serve your needs.yaml
file or extend mine.