Retrieving data from ESGF with Intake-ESGF#
This notebook explains how to use Intake-ESGF to get data from ESGF. You do not need to know anything about intake to run it. Intake-ESGF will search the ESGF Index nodes including Globus nodes, compare the results with your local data base and start a thread pool to download files in parallel. It is both efficient and simple to apply.
On Levante, this Notebook works well with the /work/bm1344/conda-envs/py_312
environment.
Configure Intake-ESGF with your local cache and the esg dataroot and spefici ESGF Indexes.
Setup a request dictionary
Start retrieving
from ipywidgets import FloatProgress
import intake_esgf
Configure Intake-ESGF#
For the parameter setting, you can use
a
/scratch
path for yourlocal_cache
the CMIP Data Pool trunk as the
esg_dataroot
one of the high priority ESGF index nodes, e.g.
esgf.ceda.ac.uk
, to be sure to find all data
intake_esgf.conf.set(
local_cache="/work/ik1017/Ingest/requests/US2",
esg_dataroot="/work/ik1017/CMIP6/data",
#all_indices=True,
indices={"esgf.ceda.ac.uk":True}
)
Setupt a request#
Make sure that your request does not grow to large. A total size of 1-10TB should be fine. One can assume an average speed of 50MB/s depending on the data node from where you retrieve. This results in about 4TB/day.
high23=dict(
project="CMIP6",
activity_id=["CFMIP","CMIP","DAMIP"],
institution_id=["CMCC","NCAR","NOAA-GFDL"],
source_id=["CESM2","CESM2-FV2","CESM2-WACCM","CESM2-WACCM-FV2","CMCC-CM2-SR5","GFDL-CM4"],
experiment_id=["1pctCO2","abrupt-2xCO2","abrupt-4xCO2","esm-piControl","hist-GHG","hist-aer","hist-nat","historical","piControl"],
variant_label=["r10i1p1f1","r11i1p1f1","r1i1p1f1","r1i2p2f1","r2i1p1f1","r3i1p1f1","r3i1p2f1","r4i1p1f1","r5i1p1f1","r7i1p1f1","r8i1p1f1","r9i1p1f1"],
table_id=["Oday"],
variable_id=["chlos","tossq"],
grid_label=["gn","gr"]
)
Start retrieving#
With the following three commands, you start your retrieval:
cat = intake_esgf.catalog.ESGFCatalog()
subset = cat.search(**high23)
dsdict=subset.to_dataset_dict(add_measures=False)
If you think your retrieved would be valueable for all DKRZ users, please contact supportATdkrz.de so that we can overtake the data and bring it to the CMIP Data Pool.