{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## CEDA CF Checker\n", "\n", "The [CF Checker](https://github.com/cedadev/cf-checker) software tool is provided by [CEDA](https://www.ceda.ac.uk/) (Center for Environmental Analysis) to verify that netCDF files comply to the [CF convention](https://cfconventions.org/). \n", "\n", "> The CF conventions have been adopted by a number of projects and groups as a primary standard. The conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Installation and Preparation\n", "\n", "We recommend using `pip`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sys\n", "import os\n", "\n", "newpath = f\"{os.sep.join(sys.executable.split(os.sep)[:-1])}:{os.environ['PATH']}\"\n", "os.environ[\"PATH\"] = newpath" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Settings\n", "\n", "Specify the file or dataset to be tested in `testfile`.\n", "\n", "The CF Checker uses the standard name tables as input. They will be downloaded to the working directory `working_dir` if you set the switch `download_tables=True`. Three tables are required which are versioned with different version numbers. You can specify them directly in the `versions` dictionary or set the switch `update_versions=True` so that the recent versions are taken from the homepage." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "testfile = \"/work/ik1017/CMIP6//data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp370/r1i1p1f1/Amon/tas/gn/v20190710/*\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "update_versions = False\n", "download_tables = False" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "table_dict = {\n", " \"cf-standard-name-table\": {\n", " \"version\": 76,\n", " \"page\": \"http://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html\",\n", " },\n", " \"area-type-table\": {\n", " \"version\": 9,\n", " \"page\": \"http://cfconventions.org/Data/area-type-table/current/build/area-type-table.html\",\n", " },\n", " \"standardized-region-list\": {\n", " \"version\": 4,\n", " \"page\": \"http://cfconventions.org/Data/standardized-region-list/standardized-region-list.current.html\",\n", " },\n", "}\n", "working_dir = \"./\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Initialization\n", "\n", "If all switches are True, we download the homepage with the `request` package and parse it with `BeautifulSoup`. We then create download `url`s with fitting version numbers for the tables and download them to the working directory." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "from bs4 import BeautifulSoup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_recent_versions(page):\n", " response = requests.get(page)\n", " parsed_html = BeautifulSoup(response.content)\n", " return int(str(parsed_html).split(\"Version\")[1].split(\",\")[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if update_versions:\n", " for idx, key in enumerate(table_dict.keys()):\n", " table_dict[key][\"version\"] = get_recent_versions(table_dict[key][\"page\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "table_dict" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "table_dict[\"cf-standard-name-table\"][\n", " \"url\"\n", "] = \"http://cfconventions.org/Data/cf-standard-names/{0}/src/cf-standard-name-table.xml\".format(\n", " table_dict[\"cf-standard-name-table\"][\"version\"]\n", ")\n", "table_dict[\"area-type-table\"][\n", " \"url\"\n", "] = \"http://cfconventions.org/Data/area-type-table/{0}/src/area-type-table.xml\".format(\n", " table_dict[\"area-type-table\"][\"version\"]\n", ")\n", "table_dict[\"standardized-region-list\"][\n", " \"url\"\n", "] = \"http://cfconventions.org/Data/standardized-region-list/standardized-region-list.{0}.xml\".format(\n", " table_dict[\"standardized-region-list\"][\"version\"]\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for tablename in table_dict.keys():\n", " table_dict[tablename][\"local_path\"] = \"{0}/CF/{1}-{2}.xml\".format(\n", " working_dir, tablename, table_dict[tablename][\"version\"]\n", " )\n", " if download_tables:\n", " response = requests.get(table_dict[tablename][\"url\"])\n", " with open(\n", " table_dict[tablename][\"local_path\"],\n", " \"wb\",\n", " ) as file:\n", " file.write(response.content)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "table_dict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Application\n", "\n", "We run the CF checker with `subprocess` in a shell and capture all output." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import subprocess\n", "\n", "a = subprocess.run(\n", " \"cfchecks -a {0} -r {1} -s {2} {3}\".format(\n", " table_dict[\"area-type-table\"][\"url\"],\n", " table_dict[\"standardized-region-list\"][\"url\"],\n", " table_dict[\"cf-standard-name-table\"][\"url\"],\n", " testfile,\n", " ),\n", " capture_output=True,\n", " shell=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Results\n", "\n", "We write the stdout into a file in the `working_dir`. Additionally, we grep for three patterns in the `stdout` to create a **summary** of the cfchecker results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "files = [\n", " fileline.split(\":\")[1]\n", " for fileline in a.stdout.decode(\"utf-8\").split(\"\\n\")\n", " if \"CHECKING NetCDF FILE\" in fileline\n", "]\n", "warnings = [\n", " warningline.split(\":\")[1]\n", " for warningline in a.stdout.decode(\"utf-8\").split(\"\\n\")\n", " if \"WARNINGS given\" in warningline\n", "]\n", "errors = [\n", " errorline.split(\":\")[1]\n", " for errorline in a.stdout.decode(\"utf-8\").split(\"\\n\")\n", " if \"ERRORS detected\" in errorline\n", "]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!rm -r cf-checker-results\n", "!mkdir -p cf-checker-results\n", "with open(working_dir + \"cf-checker-results/\" + files[0].split(\"/\")[-1], \"w\") as file:\n", " file.write(a.stdout.decode(\"utf-8\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result_dict = {}\n", "for idx, file in enumerate(files):\n", " result_dict[file] = {\"warnings\": warnings[idx], \"errors\": errors[idx]}\n", "print(result_dict)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "python3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.0" } }, "nbformat": 4, "nbformat_minor": 4 }