WARNING: This is ongoing work….

What data do you offer?

The data from numerical experiments which contribute to internationally coordinated model intercomparison projects, the so-called MIPs. The most prominent global MIP activity is CMIP, the Coupled Model Intercomparison project. CMIP is currently in its 6th phase and the data produced along its protocols forms the scientific basis for the climate projection analyses in IPCC’s AR6 (Sixth Assessment Report). The data pool holds model data covering CMIP3, CMIP5 and CMIP6 activities.

What about observations?

Not (yet) available in the pool - but have a look on ‘What is reanalysis?’ and ERA5 data. Maybe that serves your purpose.

What kind of models produce MIP data?

Dynamical models: Earth System Models (ESMs), Global Climate Models (GCMs), Regional Climate Models (RCMs), Weather Models (ECMWF for Reanalysis). (!)

How and where can I access the data on DKRZ’s High Performance Computer?

We link model data repositories and catalogs in the file system under /pool/data. If you are not a DKRZ user yet, you may want to apply for the transnational access service of IS-ENES3 or the ENES Climate Analytics (ECAS) service. In that case, you need to request to become member of project bk1088.

Basic data management terminology#

What is a dataset?

Datasets in CMIP are defined to be all files required to cover the entire time series of a single variable of a single simulation of a single experiment of a single model. That can be multiple files, but they are all in ONE directory uniquely defined by the Data Reference Syntax (DRS).

How is the path to the data constructed? What is the path template? What is the DRS?

The Data Reference Syntax is a set of required attributes which uniquely identify and describe a dataset. Since only DRS elements are used to construct the directory structure and file names of a dataset, you can use the DRS to find data on the filesystem. The path template for CMIP6 is:











Output of Earth System and Climate Models#

Why do climate scientists use Models?

While physicists usually can conduct experiments in laboratories, climate scientists do not have this option: Testing a thesis on effects of climate change with our one and only planet is highly morally questionable (unless it is about emitting a lot of CO2, obviously… ). The only access to such questions is through the digital representation of earth in models. Using these models, hypotheses about causalities between earth system changes can be tested by e.g. implementing corresponding boundary or initial conditions into a simulation.





product type

“model-output” (“output”) is the only allowed value in CMIP6 (CMIP5)

How is earth described by models?

Dynamical models are based on physical and biogeochemical equations that have been proven in laboratory experiments. However, these equations cannot be solved or applied for every location and point in time possible because of both the complexity and nonlinearity of the earth system as well as computational resource limitations. This is why the scientists use numerics for finding approximate solutions:

These equations are ‘discretized’ for specific points in space and time resulting in a spatial grid covering the earth and a calculation time step or output frequency. Values for variables on such a grid point are valid for the entire area that this grid point covers. Assuming a resolution of 1° in space, which is a state-of-the-art earth system resolution for global experiments, a grid cell covers about 100*100km.

Note that results have to be accordingly interpreted: If a town is located inside a grid cell, it iswrongto say that the exact value for that grid cell is a prediction or a precise measurement for that town. The value is an average for over 10000km². The effect of orography makes it clear: If a grid cell covers a mountain and a valley, one value describes the conditions for both.





Can be used to describe the horizontal grid and regridding procedure.

“data regridded to a CMIP6 standard 1x1 degree lonxlat grid from the native T63 grid using an area-average preserving method.”


Allows distinction when the variable is reported on more than one grid.

“gn” for native grid. “gr” for regridded data reported on the data provider’s preferred target grid.


Provides an indication of approximate output grid resolution.

“50 km”, “100 km”, “250 km”

What are the definitions of Earth System, Earth System Model and Climate Model?

The Earth System is the combination of all physical, chemical, biological and social components, processes and feedbacks that influence the state and the change of planet earth. https://de.wikipedia.org/wiki/Erdsystemwissenschaft#cite_note-Leemans-1

An Earth System Model integrates and couples submodels each specialized for a component of the earth system and which in combination fully describe the earth system. This means, besides an atmospheric model that represents the atmospheric state, other models are implemented to calculate the physics and dynamics of ocean, ice and land as well as biogeochemical processes. You are a biologist? Maybe the output of the land model is of your interest: You can find simulations of vegetation and land cover types, CO2budgets and many more variables for many experiments.

Note that this can be a difference to weather forecast models: Because their focus is on a short time period, it is sufficient to use an atmospheric model to only calculate the atmospheric conditions. Climate related questions on the other hand cover larger time scales as well as all parts of the earth system and therefore demand a more extended earth representation. In addition, the non-atmospheric processes such as the oceanic circulation have a considerable influence on the earth system on such time scales. Therefore, their precise description in ESMs has become a requirement for their results being useful.

A climate model is not necessarily an ESM however an ESM can most certainly be used for climate simulations. If only the atmospheric part of the ESM was used for an experiment, the term climate model may still be used. As for weather forecasts, it depends on the focus of the experiment and its underlying scientific question if a full representation of the earth system is needed at all.




source_id, model_id,


Model identifier, the short form of “source”. Values without forbidden characters like spaces. CV registered values only.

“GFDL-CM2-1”, “MPI-ESM1-2-HR”

source ,**model**

Used to fully identify the model and version. It must include the year (i.e., model vintage) when this model version was first used in a scientific application. It should also include information concerning the component models.

“CCSM2 (2002): atmos: CAM2 (c am2_0_brnchT_itea_2, T42L26); ocean: POP (pop2_0_ver_1.4.3, 3x2L15); seaIce: CSIM4; land: CLM2.0”


Experiments define what components of an ESM are required. source_type contains all components of the ESM that were switched on for the experiment. “AOGCM”, which is for atmosphere-ocean global climate model, means that a coupled simulation is conducted with atmosphere-ocean interaction. “AGCM” on the other hand means that only the Atmospheric Model was switched on.


How fine can Climate Models resolve earth?

The range of grid spacings spanned by climate models begin on the km scale and reach up to 250km. While a state of the art global resolving ESM has a grid spacing on the order of 100x100km, regional climate models can simulate on a 10x10km scale. On even finer resolutions, the climate models have to solve a set of additional physical equations for processes like turbulence and convection. Normally, those processes can be appropriately parameterized. This would cost a lot of additional computational resources.

A regional climate model (RCMs) only simulates a part of the earth, often a continent. By limiting the model area, computational resources get free and allow simulations on finer resolution. Therefore, RCMs provide information on much smaller scales supporting more detailed impact and adaptation assessment and planning.





Allows distinction when the variable is reported on more than one grid.

“gm”: global mean output is reported, so data are not gridded


Characterizes the resolution of the grid used to report model output fields. The respective measure *dmax is the average of the maximum distance of cell vertices weighted by the grid-cell’s area. For lonxlat grid cells, for example, dmax would be the diagonal distance.

“100 km” if 62km <

< 160 km

Why does the ESM science community need so much data?

Assuming a resolution of 1° in space, which is a state-of-the-art earth system resolution for global experiments, 90 levels and monthly output frequency for 100 years, this results in 360*180*90*12*100 data points for one variable of one experiment. If one value uses 4 bytes of disk space, this number is equivalent to about 25GB (uncompressed). Further multiplication factors for a project are all variables, experiments or participating models. For the CMIP6 project, those are about 100 experiments and 2000 variables.

Model Intercomparison Projects, Downscaling and Reanalysis#

What is a reanalysis? What is ERA5?

A reanalysis combines observations and the dynamics of a weather model to find the most realistic description of the earth system state at a specific time. The integration of observations into the weather model is a dedicated field of science called data assimilation. The reanalysis process includes back and forth calculation in time: From a specific starting point, the model simulates up to the next observation time where those observations are compared with the simulation. The simulation is fitted to the observation and the model calculates backward in time. One of the best weather models is the Integrated Forecast System (IFS) of the ECMWF. The most recent reanalysis product of the ECMWF is ERA5 andl is available in DKRZ’s climate data pool. Click here for documentation.

The path to the data is /pool/data/ERA5

How can Climate models be compared? How good are Climate models? What is CMIP?

In order to evaluate and compare climate models, a globally organized intercomparison project is periodically conducted. The Coupled Model Intercomparison Project (CMIP) is in its 6th phase and builds the database for reports of the International Panel on Climate Change (IPCC). Many international institutions participate in this project with their models. ‘Coupled’ means that the atmosphere and the ocean model interact with each other.

CMIP defines a range of standard experiments required to evaluate the basic features of climate models. Those include piControl, historical, abrupt-4xCO2, 1pctCO2 and amip. The preIndustrial Control simulation is the reference for other experiments and ensures that the climate model is able to simulate a stable climate for over 500 years. The historical experiment covers the years from 1850-2014, in which the climate models are forced with time series of adequate aerosol and land use fields according to observations. Therefore, the model’s ability to simulate a realistic evolution of historical climate can be evaluated with a statistical analysis of that experiment output. The abrupt-4xCO2 and 1pctCO2 experiments address CO2 forcing feedback, whereas abrupt-4xCO2 represents an abrupt quadrupling of CO2 and 1pctCO² an increase of CO2 atmospheric concentration of 1% each year for 140 years. The latter two are essential to evaluate how probable a future scenario output of that model can be. The amip experiment is an atmosphere only experiment where the sea surface temperature of the ocean is prescribed according to observations in order to better analyse the atmospheric part of the model.

If you want to provide a baseline for your analysis, refer to the results of the CMIP standard experiment evaluation.





Allows distinction when the variable is reported on more than one grid.

“gm”: global mean output is reported, so data are not gridded


Characterizes the resolution of the grid used to report model output fields. The respective measure *dmax is the average of the maximum distance of cell vertices weighted by the grid-cell’s area. For lonxlat grid cells, for example, dmax would be the diagonal distance.

“100 km” if 62km <

< 160 km


How does `Regional Downscaling <https://cordex.org/about/what-is-regional-downscaling/>`__ work? What is CORDEX?

Regional Climate Model simulations are driven by data obtained from global simulations i.e. they use them as initial and boundary conditions. These data can stem from both experiments or reanalysis. Since the focus is on a small scale, regional climate models are often combinations of only atmospheric-land submodels without ocean and biogeochemistry. The integration of all submodels is part of ongoing research efforts.

Are you looking for robust information for a localised domain? Then have a look at CORDEX (Coordinated Regional Climate Downscaling Experiment, `CORDEX <https://cordex.org/>`__ . Under the CORDEX protocol, RCM results have been made comparable and evaluable. We provide CORDEX data in the climate data pool.

What is an endorsed MIP? Why are there other MIPs inside CMIP6? How is CMIP structured?

For the recent phase 6 of CMIP, its design is a framework which allows smaller model intercomparison projects (MIPs) with a specific focus to be endorsed to CMIP6. That means, each model that runs the standard CMIP experiments (see How good are Climate models?) can participate in CMIP6 and further MIPs. The most often used activities are CMIP, which contains the standard experiments (see ‘how good are climate models?’), and ScenarioMIP, which contains the future scenarios.

While these endorsed MIPs do rather detached science, the modelers and the data users benefit from a shared data infrastructure including a condensed data request, a data pool and a data portal for all MIPs. We at DKRZ provide services for infrastructure tools that simplify navigating through the CMIP6 requirements (e.g. https://c6dreq.dkrz.de ).

What scientific questions are addressed by endorsed MIPs of CMIP6?

The Coupled Model Intercomparison Project Phase 6 is designed (doi:10.5194/gmd-9-1937-2016) in order to tackle three main questions and the Grand Science Challenges defined by the World Climate Research Programme (Grand Challenges Overview). Each of those can be associated with one or more endorsed MIPs (see ‘What is an endorsed MIP?’) (see the upcoming table).

The three questions are:

  • How does the Earth system respond to forcing?

  • What are the origins and consequences of systematic model biases?

  • How can we assess future climate changes given internal climate variability, predictability, and uncertainties and scenarios?

The Grand Science Challenges relate to

  1. advancing understanding of the role of clouds in the general atmospheric circulation and climate sensitivity

  2. assessing the response of the cryosphere to a warming climate and its global consequences

  3. understanding the factors that control water availability over land

  4. assessing climate extremes, what controls them, how they’ve changed in the past and how they might change in the future

  5. understanding and predicting regional sea level change and its coastal impacts

  6. improving near-term cli-mate predictions

  7. determining how biogeochemical cycles and feedback control greenhouse gas concentrations and climate change.

What does the definition of a numerical experiment include? What are the assumptions? How are experiments for Climate Models constructed?