Replication and Syncronization#

In addition to the primary publication, the data pool is filled by replication of datasets from other ESGF nodes. Here, we document the tools which we use for that and which datasets have higher priority.

The tools for replication have the following attributes:

Operational
The syncronization process is designed to run “eternally” as the publication at the other nodes is ongoing. Therefore, the process must be run as a cronjob and as a service that can restart itself.
Syncronization
The continous update of an entire repository requires that data from other nodes is tested for compatibility. This includes checks against replication errors or design errors occured at the publishing ESGF node.
Priorization
Due to the extent of the entire CMIP6 data repository, important data is defined by the AR6 working groups which will have a higher priority.
Load management
The syncronization disseminates downloads so that new datasets are not downloaded from only one ESGF node.

In case you miss some ESGF CMIP(6) data available at other ESGF nodes you can request a data replication by contacting esgf-replication@dkrz.de. In case you have requirements with respect to storage of derived data products or the inclusion of non-ESGF accessible data sets please contact data-pool@dkrz.de.