Retrieving data from ESGF with Intake-ESGF

Retrieving data from ESGF with Intake-ESGF#

This notebook explains how to use Intake-ESGF to get data from ESGF. You do not need to know anything about intake to run it. Intake-ESGF will search the ESGF Index nodes including Globus nodes, compare the results with your local data base and start a thread pool to download files in parallel. It is both efficient and simple to apply.

On Levante, this Notebook works well with the /work/bm1344/conda-envs/py_312 environment.

Configure Intake-ESGF with your local cache and the esg dataroot and spefici ESGF Indexes.
Setup a request dictionary
Start retrieving

from ipywidgets import FloatProgress
import intake_esgf

Configure Intake-ESGF#

For the parameter setting, you can use

a /scratch path for your local_cache
the CMIP Data Pool trunk as the esg_dataroot
one of the high priority ESGF index nodes, e.g. esgf.ceda.ac.uk, to be sure to find all data

intake_esgf.conf.set(
    local_cache="/work/ik1017/Ingest/requests/US2",
    esg_dataroot="/work/ik1017/CMIP6/data",
    #all_indices=True,
    indices={"esgf.ceda.ac.uk":True}
)

Setupt a request#

Make sure that your request does not grow to large. A total size of 1-10TB should be fine. One can assume an average speed of 50MB/s depending on the data node from where you retrieve. This results in about 4TB/day.

high23=dict(
    project="CMIP6",
    activity_id=["CFMIP","CMIP","DAMIP"],
    institution_id=["CMCC","NCAR","NOAA-GFDL"],
    source_id=["CESM2","CESM2-FV2","CESM2-WACCM","CESM2-WACCM-FV2","CMCC-CM2-SR5","GFDL-CM4"],
    experiment_id=["1pctCO2","abrupt-2xCO2","abrupt-4xCO2","esm-piControl","hist-GHG","hist-aer","hist-nat","historical","piControl"],
    variant_label=["r10i1p1f1","r11i1p1f1","r1i1p1f1","r1i2p2f1","r2i1p1f1","r3i1p1f1","r3i1p2f1","r4i1p1f1","r5i1p1f1","r7i1p1f1","r8i1p1f1","r9i1p1f1"],
    table_id=["Oday"],
    variable_id=["chlos","tossq"],
    grid_label=["gn","gr"]
)

Start retrieving#

With the following three commands, you start your retrieval:

cat = intake_esgf.catalog.ESGFCatalog()
subset = cat.search(**high23)
dsdict=subset.to_dataset_dict(add_measures=False)

If you think your retrieved would be valueable for all DKRZ users, please contact supportATdkrz.de so that we can overtake the data and bring it to the CMIP Data Pool.

Retrieving data from ESGF with Intake-ESGF

Contents

Retrieving data from ESGF with Intake-ESGF#

Configure Intake-ESGF#

Setupt a request#

Start retrieving#