{ "cells": [ { "cell_type": "markdown", "id": "60f30fd1-60c6-4f7e-9032-1fba56ef707d", "metadata": {}, "source": [ "## Retrieving data from ESGF with Intake-ESGF\n", "\n", "This notebook explains how to use Intake-ESGF to get data from ESGF. You do not need to know anything about intake to run it. Intake-ESGF will search the ESGF Index nodes including Globus nodes, compare the results with your local data base and start a thread pool to download files in parallel. It is both *efficient* and *simple* to apply.\n", "\n", "On Levante, this Notebook works well with the `/work/bm1344/conda-envs/py_312` environment." ] }, { "cell_type": "markdown", "id": "30ee4ec9-4eba-4e7d-bc27-abbabcbc3081", "metadata": {}, "source": [ "1. Configure Intake-ESGF with your *local cache* and the *esg dataroot* and spefici *ESGF Indexes*.\n", "1. Setup a request dictionary\n", "1. Start retrieving" ] }, { "cell_type": "code", "execution_count": null, "id": "91ad819f-345c-46cf-8e0f-7c888d531271", "metadata": { "tags": [] }, "outputs": [], "source": [ "from ipywidgets import FloatProgress\n", "import intake_esgf" ] }, { "cell_type": "markdown", "id": "93de6037-4d99-4449-9b92-8a4ad8f54303", "metadata": {}, "source": [ "### Configure Intake-ESGF\n", "\n", "For the parameter setting, you can use\n", "- a `/scratch` path for your `local_cache`\n", "- the CMIP Data Pool trunk as the `esg_dataroot`\n", "- one of the high priority ESGF index nodes, e.g. `esgf.ceda.ac.uk`, to be sure to find all data" ] }, { "cell_type": "code", "execution_count": null, "id": "caa27ccc-8f01-4a9a-a73b-02327ab2f0a4", "metadata": { "tags": [] }, "outputs": [], "source": [ "intake_esgf.conf.set(\n", " local_cache=\"/work/ik1017/Ingest/requests/US2\",\n", " esg_dataroot=\"/work/ik1017/CMIP6/data\",\n", " #all_indices=True,\n", " indices={\"esgf.ceda.ac.uk\":True}\n", ")" ] }, { "cell_type": "markdown", "id": "cda2bbda-33de-4817-bc8e-845e2872bf6f", "metadata": {}, "source": [ "### Setupt a request\n", "\n", "Make sure that your request does not grow to large. A total size of 1-10TB should be fine. One can assume an average speed of 50MB/s depending on the data node from where you retrieve. This results in about 4TB/day." ] }, { "cell_type": "code", "execution_count": null, "id": "5291fbf8-e125-42da-b207-d979f56e39a4", "metadata": { "tags": [] }, "outputs": [], "source": [ "high23=dict(\n", " project=\"CMIP6\",\n", " activity_id=[\"CFMIP\",\"CMIP\",\"DAMIP\"],\n", " institution_id=[\"CMCC\",\"NCAR\",\"NOAA-GFDL\"],\n", " source_id=[\"CESM2\",\"CESM2-FV2\",\"CESM2-WACCM\",\"CESM2-WACCM-FV2\",\"CMCC-CM2-SR5\",\"GFDL-CM4\"],\n", " experiment_id=[\"1pctCO2\",\"abrupt-2xCO2\",\"abrupt-4xCO2\",\"esm-piControl\",\"hist-GHG\",\"hist-aer\",\"hist-nat\",\"historical\",\"piControl\"],\n", " variant_label=[\"r10i1p1f1\",\"r11i1p1f1\",\"r1i1p1f1\",\"r1i2p2f1\",\"r2i1p1f1\",\"r3i1p1f1\",\"r3i1p2f1\",\"r4i1p1f1\",\"r5i1p1f1\",\"r7i1p1f1\",\"r8i1p1f1\",\"r9i1p1f1\"],\n", " table_id=[\"Oday\"],\n", " variable_id=[\"chlos\",\"tossq\"],\n", " grid_label=[\"gn\",\"gr\"]\n", ")" ] }, { "cell_type": "markdown", "id": "b1d988f7-4450-4859-acb9-e6de6bf91b25", "metadata": {}, "source": [ "### Start retrieving\n", "\n", "With the following three commands, you start your retrieval:" ] }, { "cell_type": "code", "execution_count": null, "id": "11053e0d-1bb8-4056-832b-78daa13ea572", "metadata": { "tags": [] }, "outputs": [], "source": [ "cat = intake_esgf.catalog.ESGFCatalog()\n", "subset = cat.search(**high23)\n", "dsdict=subset.to_dataset_dict(add_measures=False)" ] }, { "cell_type": "markdown", "id": "0c42654f-26f3-4360-9796-cc2573276928", "metadata": {}, "source": [ "If you think your retrieved would be valueable for all DKRZ users, please contact supportATdkrz.de so that we can overtake the data and bring it to the CMIP Data Pool." ] }, { "cell_type": "code", "execution_count": null, "id": "9fa933bb-ed40-4b8f-94a0-35ccb3e7a018", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "python3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" } }, "nbformat": 4, "nbformat_minor": 5 }