The DKRZ CMIP Data Pool#

This is a beginners-level demonstration notebook and introduces you to the Data Pool at DKRZ. Based on the example of the recent phase 6 of the Coupled Model Intercomparison Project (CMIP6), you will learn

  • how you benefit from the CMIP Data Pool (CDP)

  • how to approach CMIP data

  • how to use the python packages intake-esm, xarray and pandas to investigate the CMIP Data Pool

This notebook can be executed on DKRZ’s jupyterhub platform. For a detailled introduction into jupyterhub and intake, we recommend the DKRZ tech talks

Customizing the code inside, however, only requires basic python knowledge.

Introduction#

The Scientific Steering Commitee has thankfully granted a disk space on mistral lustre file system of 5PB for the CMIP Data Pool for 2021. Started in 2016, DKRZ runs and maintains this common storage place.

πŸ“’ The DKRZ CMIP data pool contains often needed flagship collections of climate model data, is hosted as part of the DKRZ data infrastructure and supports scientists in high volume climate data collection, access and processing.

The notebook sources for the doc pages are available in this gitlab-repo

Important news and updates will be announced

⭐ Highlight CDP climate model data collections are:

  • CMIP6: In May 2021, DKRZ provides Europe’s largest data pool with an amount of 4 PB for the recent phase of the Coupled Model Intercomparison Project

  • CORDEX: The size of data for the Coordinated Regional Downscaling Experiment is about 600TB over different projects.

  • CMIP5: The fifth phase of CMIP.

An example of a project which is also in the data pool, but not included in the term CMIP6⁺:

from IPython.display import HTML, display, Markdown, IFrame
display(Markdown("Time series of three different data pool disk space measures. DKRZ has published about 1.5 PB, 2.5 PB are replicated data from other data nodes. An average CMIP6 dataset contains about 5 files and covers 4GB."))
IFrame(src="https://swift.dkrz.de/v1/dkrz_a44962e3ba914c309a7421573a6949a6/Pool-Statistics/pool-timeseries-hvplot.html",width="900",height="550",frameborder="0")

Time series of three different data pool disk space measures. DKRZ has published about 1.5 PB, 2.5 PB are replicated data from other data nodes. An average CMIP6 dataset contains about 5 files and covers 4GB.

Ongoing Activities - Analysis support#

One part of the data pool support is to make computational resources available to a broader and EU-wide user community. Two IS-ENES services both free of charge but limited to the available resources enable users to join computational projects equipped with sufficient resources for CMIP analysis and for a temporary amount of time.

  1. The phase 3 of IS-ENES provides the Analysis Platforms service

    • Regular proposal mechanism with a review procedure.

    • Successful proposals are granted exclusively resources for server-side data analyses.

  2. The ENES Climate Analytics Service ECAS

    • Minimal application procedure most likely with a positive outcome

    • One month limited and shared resources

display(Markdown("We develop, prepare and provide [jupyter notebook demonstrations](https://gitlab.dkrz.de/data-infrastructure-services/tutorials-and-use-cases) <br> " 
                 "- as tutorials for software packages and applications *starting from scratch* </br>"
                 "- for more frequent use cases like the plot of `tas` of one member of two experiments and simulated by the German ESMs."))
IFrame(src="https://swift.dkrz.de/v1/dkrz_a44962e3ba914c309a7421573a6949a6/plots/globalmean-yearlymean-tas.html",width="1000",height="650",frameborder="0")

We develop, prepare and provide jupyter notebook demonstrations
- as tutorials for software packages and applications starting from scratch
- for more frequent use cases like the plot of tas of one member of two experiments and simulated by the German ESMs.

Why do we host the CDP? πŸ€”#

πŸ‘‰ The key benefit of the data pool is that the data is available on lustre (/work) so that All DKRZ users with a current account have access. There is less need for local copies or data downloads. πŸ‘ˆ

Where can I find the data pool? πŸ•#

The Data pool can be accessed from different portals.

  • Server-side on the file system e.g. under /pool/data/CMIP6/data

    • All mistral/levante users with a current account have permission to do that.

    • Fastest way to work with the data

%%bash
#Browsing with linux commands
ls /pool/data/CMIP6/data/ -x
echo ""
#For which MIPs did MPI-ESM1-2-XR produce data for?
find /pool/data/CMIP6/data/ -maxdepth 3 -name MPI-ESM1-2-XR -type d
CMIP6  CMIP6_corrupted

%%bash
#Using the FreVA CMIP-ESMVal tool
#module load cmip6-dicad/1.0               
#freva --databrowser --help

Understanding CMIP6 data#

πŸ§‘β€πŸ« The goal of CMIP6

In order to evaluate and compare climate models, a globally organized intercomparison project is periodically conducted. CMIP6 tackles three major questions:

  • How does the Earth system respond to forcing? πŸš‚

  • What are the origins and consequences of systematic model biases? 🐞

  • How can we assess future climate changes given internal climate variability, predictability, and uncertainties and scenarios? 🌑

From Eyring et al., 2016. Schematic of the CMIP/CMIP6 experiment design

The CMIP6 framework allows smaller model intercomparison projects (MIPs) with a specific focus to be endorsed to CMIP6. That means, each model that runs the standard CMIP experiments can participate in CMIP6 and further MIPs.

Metadata: Required Attributes and Controlled Vocabularies#

CDP data is self-descriptive as it contains extensive and controlled metadata. This metadata is prepared in the search facets of the data portals and catalogs.

πŸ“œ

Besides the technical requirements, the CMIP data standard defines required attributes in so called Controlled Vocabularies (CV). While some values are predefined, models and institutions have to be registered to become a valid value of corresponding attributes. For many attributes, both a short form with _id and a longer description exist.

Important required attributes:

  • activity_id: A CMIP6-endorsed MIP that investigates a specific research question. It defines experiments and requests data for it.

  • source_id : An ID for the Earth System Model used to produce the data.

  • experiment_id: The experiment which was conducted by the source_id.

  • member_id : The ensemble simulation member of the experiment_id. All members should be statistically equal.

Investigating the CMIP6 data pool with intake-esm β›΅#

Features

  • display catalogs as clearly structured tables inside jupyter notebooks for easy investigation

import intake
cloudpath="https://swift.dkrz.de/v1/dkrz_a44962e3ba914c309a7421573a6949a6/intake-esm/dkrz_data-pool_cloudcatalog.yaml"
poolpath="/pool/data/Catalogs/dkrz_cmip6_disk_netcdf.json"
cdp = intake.open_catalog(cloudpath)
col = cdp.dkrz_cmip6_disk_netcdf_fromcloud
col.df.head()
---------------------------------------------------------------------------
ClientResponseError                       Traceback (most recent call last)
File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/fsspec/implementations/http.py:383, in HTTPFileSystem._info(self, url, **kwargs)
    381 try:
    382     info.update(
--> 383         await _file_info(
    384             url,
    385             size_policy=policy,
    386             session=session,
    387             **self.kwargs,
    388             **kwargs,
    389         )
    390     )
    391     if info.get("size") is not None:

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/fsspec/implementations/http.py:734, in _file_info(url, session, size_policy, **kwargs)
    733 async with r:
--> 734     r.raise_for_status()
    736     # TODO:
    737     #  recognise lack of 'Accept-Ranges',
    738     #                 or 'Accept-Ranges': 'none' (not 'bytes')
    739     #  to mean streaming only, no random access => return None

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/aiohttp/client_reqrep.py:1004, in ClientResponse.raise_for_status(self)
   1003 self.release()
-> 1004 raise ClientResponseError(
   1005     self.request_info,
   1006     self.history,
   1007     status=self.status,
   1008     message=self.reason,
   1009     headers=self.headers,
   1010 )

ClientResponseError: 404, message='Not Found', url=URL('https://swift.dkrz.de/v1/dkrz_a44962e3ba914c309a7421573a6949a6/intake-esm/dkrz_data-pool_cloudcatalog.yaml')

The above exception was the direct cause of the following exception:

FileNotFoundError                         Traceback (most recent call last)
Input In [5], in <cell line: 4>()
      2 cloudpath="https://swift.dkrz.de/v1/dkrz_a44962e3ba914c309a7421573a6949a6/intake-esm/dkrz_data-pool_cloudcatalog.yaml"
      3 poolpath="/pool/data/Catalogs/dkrz_cmip6_disk_netcdf.json"
----> 4 cdp = intake.open_catalog(cloudpath)
      5 col = cdp.dkrz_cmip6_disk_netcdf_fromcloud
      6 col.df.head()

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/intake/__init__.py:167, in open_catalog(uri, **kwargs)
    164 if driver not in registry:
    165     raise ValueError('Unknown catalog driver (%s), supply one of: %s'
    166                      % (driver, list(sorted(registry))))
--> 167 return registry[driver](uri, **kwargs)

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/intake/catalog/local.py:573, in YAMLFileCatalog.__init__(self, path, autoreload, **kwargs)
    571 self.filesystem = kwargs.pop('fs', None)
    572 self.access = "name" not in kwargs
--> 573 super(YAMLFileCatalog, self).__init__(**kwargs)

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/intake/catalog/base.py:110, in Catalog.__init__(self, entries, name, description, metadata, ttl, getenv, getshell, persist_mode, storage_options, user_parameters)
    108 self.updated = time.time()
    109 self._entries = entries if entries is not None else self._make_entries_container()
--> 110 self.force_reload()

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/intake/catalog/base.py:168, in Catalog.force_reload(self)
    166 """Imperative reload data now"""
    167 self.updated = time.time()
--> 168 self._load()

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/intake/catalog/local.py:603, in YAMLFileCatalog._load(self, reload)
    600     file_open = self.filesystem.open(self.path, mode='rb')
    601 self._dir = get_dir(self.path)
--> 603 with file_open as f:
    604     text = f.read().decode()
    605 if "!template " in text:

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/fsspec/core.py:103, in OpenFile.__enter__(self)
    100 def __enter__(self):
    101     mode = self.mode.replace("t", "").replace("b", "") + "b"
--> 103     f = self.fs.open(self.path, mode=mode)
    105     self.fobjects = [f]
    107     if self.compression is not None:

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/fsspec/spec.py:1009, in AbstractFileSystem.open(self, path, mode, block_size, cache_options, compression, **kwargs)
   1007 else:
   1008     ac = kwargs.pop("autocommit", not self._intrans)
-> 1009     f = self._open(
   1010         path,
   1011         mode=mode,
   1012         block_size=block_size,
   1013         autocommit=ac,
   1014         cache_options=cache_options,
   1015         **kwargs,
   1016     )
   1017     if compression is not None:
   1018         from fsspec.compression import compr

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/fsspec/implementations/http.py:343, in HTTPFileSystem._open(self, path, mode, block_size, autocommit, cache_type, cache_options, size, **kwargs)
    341 kw["asynchronous"] = self.asynchronous
    342 kw.update(kwargs)
--> 343 size = size or self.info(path, **kwargs)["size"]
    344 session = sync(self.loop, self.set_session)
    345 if block_size and size:

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/fsspec/asyn.py:85, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
     82 @functools.wraps(func)
     83 def wrapper(*args, **kwargs):
     84     self = obj or args[0]
---> 85     return sync(self.loop, func, *args, **kwargs)

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/fsspec/asyn.py:65, in sync(loop, func, timeout, *args, **kwargs)
     63     raise FSTimeoutError from return_result
     64 elif isinstance(return_result, BaseException):
---> 65     raise return_result
     66 else:
     67     return return_result

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/fsspec/asyn.py:25, in _runner(event, coro, result, timeout)
     23     coro = asyncio.wait_for(coro, timeout=timeout)
     24 try:
---> 25     result[0] = await coro
     26 except Exception as ex:
     27     result[0] = ex

File /mnt/root_disk3/gitlab-runner/.conda/envs/mambaenv/lib/python3.10/site-packages/fsspec/implementations/http.py:396, in HTTPFileSystem._info(self, url, **kwargs)
    393     except Exception as exc:
    394         if policy == "get":
    395             # If get failed, then raise a FileNotFoundError
--> 396             raise FileNotFoundError(url) from exc
    397         logger.debug(str(exc))
    399 return {"name": url, "size": None, **info, "type": "file"}

FileNotFoundError: https://swift.dkrz.de/v1/dkrz_a44962e3ba914c309a7421573a6949a6/intake-esm/dkrz_data-pool_cloudcatalog.yaml

Features

  • browse through the catalog and select your data without being on the pool file system

⇨ A pythonic reproducable alternative compared to complex find commands or GUI searches. No need for Filesystems and filenames.

tas = col.search(experiment_id="historical", source_id="MPI-ESM1-2-HR", variable_id="tas", table_id="Amon", member_id="r1i1p1f1")
tas
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 tas = col.search(experiment_id="historical", source_id="MPI-ESM1-2-HR", variable_id="tas", table_id="Amon", member_id="r1i1p1f1")
      2 tas

NameError: name 'col' is not defined

Features

  • open climate data in an analysis ready dictionary of xarray datasets

Forget about temporary merging and reformatting steps!

tas.to_dataset_dict()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 tas.to_dataset_dict()

NameError: name 'tas' is not defined

Intake best practises:#

  • Intake can make your scripts reusable.

    • Instead of working with local copy or editions of files, always start from a globally defined catalog which everyone can access

    • Save the subset of the catalog which you work on as a new catalog instead of a subset of files

  • Check for new ingests by just repeating your script - it will open the most recent catalog.

  • Only load datasets with to_dataset_dict into xarrray which do not exceed your memory limits

Let’s get an overview over the CMIP6 Data pool by

  • finding the number of unique values of attributes

  • group and plot the names and sizes of different entries

The resulting statistics is about the percentage of File numbers.

unique_activites=col.unique("activity_id")
print(list(unique_activites["activity_id"].values()))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 unique_activites=col.unique("activity_id")
      2 print(list(unique_activites["activity_id"].values()))

NameError: name 'col' is not defined
def pieplot(gbyelem) :
    #groupby, sort and select the ten largest
    size = col.df.groupby([gbyelem]).size().sort_values(ascending=False)
    size10 = size.nlargest(10)
    #Sum all others as 10th entry
    size10[9] = sum(size[9:])
    size10.rename(index={size10.index.values[9]:'all other'},inplace=True)
    #return a pie plot
    return size10.plot.pie(figsize=(18,8),ylabel='',autopct='%.2f', fontsize=16)
pieplot("activity_id")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [10], in <cell line: 1>()
----> 1 pieplot("activity_id")

Input In [9], in pieplot(gbyelem)
      1 def pieplot(gbyelem) :
      2     #groupby, sort and select the ten largest
----> 3     size = col.df.groupby([gbyelem]).size().sort_values(ascending=False)
      4     size10 = size.nlargest(10)
      5     #Sum all others as 10th entry

NameError: name 'col' is not defined
unique_sources=col.unique("source_id")
print("Number of unique earth system models in the cmip6 data pool: "+str(list(unique_sources["source_id"].values())[0]))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 unique_sources=col.unique("source_id")
      2 print("Number of unique earth system models in the cmip6 data pool: "+str(list(unique_sources["source_id"].values())[0]))

NameError: name 'col' is not defined
pieplot("source_id")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 pieplot("source_id")

Input In [9], in pieplot(gbyelem)
      1 def pieplot(gbyelem) :
      2     #groupby, sort and select the ten largest
----> 3     size = col.df.groupby([gbyelem]).size().sort_values(ascending=False)
      4     size10 = size.nlargest(10)
      5     #Sum all others as 10th entry

NameError: name 'col' is not defined
unique_members=col.unique("member_id")
list(unique_members["member_id"].values())[1][0:3]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [13], in <cell line: 1>()
----> 1 unique_members=col.unique("member_id")
      2 list(unique_members["member_id"].values())[1][0:3]

NameError: name 'col' is not defined

Data Reference Syntax#

An atomic Dataset contains all files which cover the entire time span of a single variable of a single simulation. This can be multiple files in one.

The Data Reference Syntax (DRS) is a set of required attributes which uniquely identify and describe a dataset. The DRS usually includes all attributes used in the path templates so that both words are used synonymously. The DRS elements are arranged to a hierarchical path template for CMIP6:

CMIP6: mip_era/activity_id/institution_id/source_id/experiment_id/member_id/table_id/variable_id/grid_label/version

Be careful when browsing through the CMIP6 data tree!

Unique in CMIP6 data hierarchy:

  • experiment_id (only in one activity_id)

  • variable_id in table_id : Both combined represent the CMIP Variable

  • Only one version for one dataset should be published

# Searching for the MIP which defines the experiment 'historical':

cat = col.search(experiment_id="historical")
cat.unique("activity_id")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [14], in <cell line: 3>()
      1 # Searching for the MIP which defines the experiment 'historical':
----> 3 cat = col.search(experiment_id="historical")
      4 cat.unique("activity_id")

NameError: name 'col' is not defined
# Searching for all tables which contain the variable 'tas':

cat = col.search(variable_id="tas")
cat.unique("table_id")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [15], in <cell line: 3>()
      1 # Searching for all tables which contain the variable 'tas':
----> 3 cat = col.search(variable_id="tas")
      4 cat.unique("table_id")

NameError: name 'col' is not defined

Not Unique in CMIP6 data hierarchy:

  • institution_id for both source_id + experiment_id ( + member_id )

No requirements for member_id

# Searching for all institution_ids which uses the model 'MPI-ESM1-2-HR' to produce 'ssp585' results:

cat = col.search(source_id="MPI-ESM1-2-HR", experiment_id="ssp585")
cat.unique("institution_id")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [16], in <cell line: 3>()
      1 # Searching for all institution_ids which uses the model 'MPI-ESM1-2-HR' to produce 'ssp585' results:
----> 3 cat = col.search(source_id="MPI-ESM1-2-HR", experiment_id="ssp585")
      4 cat.unique("institution_id")

NameError: name 'col' is not defined
# Searching for all experiment_ids produced with ESM 'EC-Earth3' and as ensemble member 'r1i1p1f1':

cat = col.search(source_id="EC-Earth3", member_id="r1i1p1f1")
cat.unique("experiment_id")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [17], in <cell line: 3>()
      1 # Searching for all experiment_ids produced with ESM 'EC-Earth3' and as ensemble member 'r1i1p1f1':
----> 3 cat = col.search(source_id="EC-Earth3", member_id="r1i1p1f1")
      4 cat.unique("experiment_id")

NameError: name 'col' is not defined
# Searching for all valid ensemble member_ids produced with ESM 'EC-Earth3' for experiment 'abrupt-4xCO2'

cat = col.search(source_id="EC-Earth3", experiment_id="abrupt-4xCO2")
cat.unique("member_id")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [18], in <cell line: 3>()
      1 # Searching for all valid ensemble member_ids produced with ESM 'EC-Earth3' for experiment 'abrupt-4xCO2'
----> 3 cat = col.search(source_id="EC-Earth3", experiment_id="abrupt-4xCO2")
      4 cat.unique("member_id")

NameError: name 'col' is not defined

⇨ Do not search for institution_id, table_id and member_id unless you are sure about what you are doing. Instead, begin to search for experiment_id, source_id, variable_id.

How can I find the variables I need? πŸ”Ž#

  1. Search for the matching standard_name

Most of the data in the data pool is compliant to the Climate and Forecast Convention. This defines standard_names, which need to be assigned to variables as a variable attribute inside the data. As a reliable description of the variable, the standard_name is a bridge to the shorter variable identifier name in the data, the so-called short_name. This short name is saved in the data catalogs which can be searched.

  1. Search for corresponding short_names in the CMIP6 data request

E.g., you get many results for air_temperature. Multiple definitions for one β€˜physical’ variable like air_temperature exist in CMIP which are mostly specific diagnostics of that variable like tasmin and tasmax. Sometimes, there is output for a specific level given as a variable, e.g. ta500. This can be the case if not all levels are requested for a specific frequency.

Best practice in ESGF

  1. Search for the fitting mip_table

Each mip_table is a combination of requirements for an output variable_id including

  • frequency

  • time cell methods (average or instantaneous)

  • vertical level (e.g. interpolated on pressure levels)

  • grid

  • realm (e.g. atmosphere model output or ocean model output)

This requirements are set according to the interest of the MIPs. Variables with the similar requirements are collected in one MIP-table which can be identified by table_id.

The data infrastructure for the DKRZ CDP#

In order to tackle the challenges of data provision and dissemination for a 4 PB repository, a state-of-the-art data infrastructure has been developed around that pool. In the following, we highlight three aspects of the data workflow.

You benefit from the DKRZ CDP because

  • its data is standardized and quality controlled πŸ›‚

  • it is a curated, updated, published and catalogized data repository πŸ‘©β€πŸ­

  • it prevents data duplication and downloading into local workspaces which is inefficient, expensive and just a waste of storage resources πŸ—‘

Data quality#

CMIP6 data is only available in a common and reliable Data format

  • No adaptions needed for output of specific models

  • Makes data interoperable πŸ“  enabling evaluation software products as, for example, ESMValTool

πŸ… CMIP6 data was quality controlled before published with PrePARE

CMIP6 data is transparent about occuring errors

  • Search the errata data base for origins of suspicious analysis results ⚠

If you find an error, please inform the modeling group. Either via the contact in the citation or, if available, via the contact attriubte in the file.

Data publication#

  • Exentended documentation for simulation conducts provided in the ES-Doc data base

  • Persistent Identfier (PIDs) ensure long-term webaccess to dataset information

  • Citation information and DOIs for all published datasets easily retrievable

One method to retrieve a citation from the data is via the attribute further_info_url

import xarray
random_file=xarray.open_dataset(cat.df["path"][0])
random_file.attrs["further_info_url"]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [19], in <cell line: 2>()
      1 import xarray
----> 2 random_file=xarray.open_dataset(cat.df["path"][0])
      3 random_file.attrs["further_info_url"]

NameError: name 'cat' is not defined

When using data provided in the framework of the DKRZ CMIP Data Pool as basis for a publication, we ask you to add the following text to the Acknowledgements-Section:

β€œWe acknowledge the CMIP6 community for providing the climate model data, retained and globally distributed in the framework of the ESGF. The CMIP6 data of this study were replicated and made available for this study by the DKRZ.”

Upcoming primary publications#

⭐ In May 2021, we joyfully expect to fill the remaining 600TB of the 5 PB CDP with primary publications of

  • data for the activity_ids DCPP (hindcasts), DAMIP (Detection and Attribution), VolMIP, FAFMIP and PMIP

  • Ensembles for specific experiments and settings (emission driven simulations)

  • ICON-ESM data

Contacts#

This notebook is a collaboration effort by the DM Data Infrastructure team.

πŸ™‚ Thank you for your attention!