## CEDA CF Checker

The [CF Checker](https://github.com/cedadev/cf-checker) software tool is provided by [CEDA](https://www.ceda.ac.uk/) (Center for Environmental Analysis) to verify that netCDF files comply to the [CF convention](https://cfconventions.org/). 

> The CF conventions have been adopted by a number of projects and groups as a primary standard. The conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. 

### 1. Installation and Preparation

We recommend using `pip`

In [None]:
import sys
import os

newpath = f"{os.sep.join(sys.executable.split(os.sep)[:-1])}:{os.environ['PATH']}"
os.environ["PATH"] = newpath

### 2. Settings

Specify the file or dataset to be tested in `testfile`.

The CF Checker uses the standard name tables as input. They will be downloaded to the working directory `working_dir` if you set the switch `download_tables=True`. Three tables are required which are versioned with different version numbers. You can specify them directly in the `versions` dictionary or set the switch `update_versions=True` so that the recent versions are taken from the homepage.

In [None]:
testfile = "/work/ik1017/CMIP6//data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp370/r1i1p1f1/Amon/tas/gn/v20190710/*"

In [None]:
update_versions = False
download_tables = False

In [None]:
table_dict = {
 "cf-standard-name-table": {
 "version": 76,
 "page": "http://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html",
 },
 "area-type-table": {
 "version": 9,
 "page": "http://cfconventions.org/Data/area-type-table/current/build/area-type-table.html",
 },
 "standardized-region-list": {
 "version": 4,
 "page": "http://cfconventions.org/Data/standardized-region-list/standardized-region-list.current.html",
 },
}
working_dir = "./"

### 3. Initialization

If all switches are True, we download the homepage with the `request` package and parse it with `BeautifulSoup`. We then create download `url`s with fitting version numbers for the tables and download them to the working directory.

In [None]:
import requests
from bs4 import BeautifulSoup

In [None]:
def get_recent_versions(page):
 response = requests.get(page)
 parsed_html = BeautifulSoup(response.content)
 return int(str(parsed_html).split("Version")[1].split(",")[0])

In [None]:
if update_versions:
 for idx, key in enumerate(table_dict.keys()):
 table_dict[key]["version"] = get_recent_versions(table_dict[key]["page"])

In [None]:
table_dict

In [None]:
table_dict["cf-standard-name-table"][
 "url"
] = "http://cfconventions.org/Data/cf-standard-names/{0}/src/cf-standard-name-table.xml".format(
 table_dict["cf-standard-name-table"]["version"]
)
table_dict["area-type-table"][
 "url"
] = "http://cfconventions.org/Data/area-type-table/{0}/src/area-type-table.xml".format(
 table_dict["area-type-table"]["version"]
)
table_dict["standardized-region-list"][
 "url"
] = "http://cfconventions.org/Data/standardized-region-list/standardized-region-list.{0}.xml".format(
 table_dict["standardized-region-list"]["version"]
)

In [None]:
for tablename in table_dict.keys():
 table_dict[tablename]["local_path"] = "{0}/CF/{1}-{2}.xml".format(
 working_dir, tablename, table_dict[tablename]["version"]
 )
 if download_tables:
 response = requests.get(table_dict[tablename]["url"])
 with open(
 table_dict[tablename]["local_path"],
 "wb",
 ) as file:
 file.write(response.content)

In [None]:
table_dict

### 4. Application

We run the CF checker with `subprocess` in a shell and capture all output.

In [None]:
import subprocess

a = subprocess.run(
 "cfchecks -a {0} -r {1} -s {2} {3}".format(
 table_dict["area-type-table"]["url"],
 table_dict["standardized-region-list"]["url"],
 table_dict["cf-standard-name-table"]["url"],
 testfile,
 ),
 capture_output=True,
 shell=True,
)

### 5. Results

We write the stdout into a file in the `working_dir`. Additionally, we grep for three patterns in the `stdout` to create a **summary** of the cfchecker results.

In [None]:
files = [
 fileline.split(":")[1]
 for fileline in a.stdout.decode("utf-8").split("\n")
 if "CHECKING NetCDF FILE" in fileline
]
warnings = [
 warningline.split(":")[1]
 for warningline in a.stdout.decode("utf-8").split("\n")
 if "WARNINGS given" in warningline
]
errors = [
 errorline.split(":")[1]
 for errorline in a.stdout.decode("utf-8").split("\n")
 if "ERRORS detected" in errorline
]

In [None]:
!rm -r cf-checker-results
!mkdir -p cf-checker-results
with open(working_dir + "cf-checker-results/" + files[0].split("/")[-1], "w") as file:
 file.write(a.stdout.decode("utf-8"))

In [None]:
result_dict = {}
for idx, file in enumerate(files):
 result_dict[file] = {"warnings": warnings[idx], "errors": errors[idx]}
print(result_dict)