CMIP6 storage#

with panel, pandas and hvplot

The primary publication of national Earth System Model data at DKRZ takes the largest part of the CMIP Data Pool (CDP). Most of the data have been produced within the national CMIP Project DICAD and in the compute project RZ988.

DKRZ supports modeling groups in all steps of the data wokflow from preparation to publication. In order to track and display the effort for this data workflow, we run automated scripts (cronjobs) which capture the extent of the final product: the disk space usage of these groups in the data pool and update it daily. The resulting statistics are uploaded into a public and freely available swift storage.

In the following, we create responsive bar plots with pandas, pandas and hvplot for statistical Key Performance Indicators of the CDP.

German contribution and publication#

Here we present you statistics of DICAD contributions to the CDP. Datasets which were

  • created as part of DICAD and

  • have been primarily published at the DKRZ ESGF Node

are considered.

The statisctis are computed by grouping the measures by:

  • source_id: Earth System Models (ESM)s which have contributed to the CDP.

  • institution_id: Institutions which have conducted and submitted model simulations to the CDP.

  • publication type: How much data has been published and replicated at DKRZ ESGF node.

import warnings
warnings.filterwarnings('ignore')
kpis=["size [TB]", "filenumber","datasets"]
import panel as pn
pn.extension("tabulator")
import pandas as pd
sourcesumdf = pd.read_csv("https://swift.dkrz.de/v1/dkrz_a44962e3ba914c309a7421573a6949a6/Pool-Statistics/mistral-cmip6-allocation-by-source.csv.gz").sort_values("size", ascending=False)
allinstdf = pd.read_csv("https://swift.dkrz.de/v1/dkrz_a44962e3ba914c309a7421573a6949a6/Pool-Statistics/mistral-cmip6-allocation-by-dicad-institutes.csv.gz").sort_values("size", ascending=False)
allreplicadf = pd.read_csv("https://swift.dkrz.de/v1/dkrz_a44962e3ba914c309a7421573a6949a6/Pool-Statistics/mistral-cmip6-allocation-by-publicationType.csv.gz").sort_values("size", ascending=False)
Hide code cell output
Hide code cell source
import intake
from pathlib import Path
import hvplot.pandas
from bokeh.models import NumeralTickFormatter
import pandas as pd
sourcesumdf["Group"]="By source_id"
sourcesumdf["Key"]="source_id"
sourcesumdf["Legend"]=sourcesumdf["source_id"]
allinstdf["Group"]="By institution_id"
allinstdf["Key"]="institution_id"
allinstdf["Legend"]=allinstdf["institution_id"]
allreplicadf["Group"]="By Publication Status"
allreplicadf["Key"]="publicationType"
allreplicadf["Legend"]=allreplicadf["publicationType"]

sourcesumdf=sourcesumdf.set_index("Group")
allinstdf=allinstdf.set_index("Group")
allreplicadf=allreplicadf.set_index("Group")
#
#plotdf=sourcesumrz.append(allinstdf).append(sourcesum).append(allreplica) #.append(expdf)
plotdf=pd.concat([sourcesumdf,allinstdf,allreplicadf])
Hide code cell output