CarbonBench

Benchmark dataset for Eulerian atmospheric transport models

CarbonBench Logo

The CarbonBench dataset is a benchmark for machine learning emulators of atmospheric CO₂ tracer transport. It was introduced in
Benson et al. (2024): Atmospheric Transport Modeling of CO₂ with Neural Networks.


GitHub Repository

Official implementation and tools for the CarbonBench dataset - a benchmark for machine learning emulators of atmospheric CO₂ tracer transport.

Stars Forks Python

HuggingFace Repository

Hosted dataset on Hugging Face with CarbonBench Zarr data and metadata for machine learning transport emulators.

Likes Downloads Dataset

Dataset Contents

The dataset contains 3D CO₂ concentrations together with meteorological drivers and flux fields.
All variables are provided as Zarr arrays chunked per timestep.

Variable Description Units
co2massmix CO₂ mass mixing ratio 10⁻⁶ kgCO₂ / kgDryAir
airmass Dry air mass Pg dry air
gph_bottom Geopotential height at lower level km
gph_top Geopotential height at upper level km
p_bottom Pressure at lower level hPa
p_top Pressure at upper level hPa
q Specific humidity kg/kg
t Air temperature K
u Zonal (eastward) wind speed m/s
v Meridional (northward) wind speed m/s
co2flux_anthro Anthropogenic surface CO₂ flux kg CO₂ m⁻² s⁻¹
co2flux_land Land biosphere surface CO₂ flux kg CO₂ m⁻² s⁻¹
co2flux_ocean Ocean surface CO₂ flux kg CO₂ m⁻² s⁻¹
blh Planetary boundary layer thickness m
tisr Incoming solar radiation J m⁻²
cell_area Surface grid-cell area
orography Surface geopotential m²/s²

Output variable:

  • co2massmix at the next timestep (same unit as input)

Baseline Model Results

Performance of the baseline emulators trained on CarbonBench:

Model Parameters (M) Decorrelation Time RMSE
UNet 9.6 >90 0.98 0.52
GraphCast 5.2 >90 0.96 0.86
SFNO 35.7 >90 0.98 0.58
SwinTransformer 37.9 >90 0.99 0.34

Downloading & Preparing the Dataset

You can build the dataset locally using the CarbonBench tools:

# Clone the repo
git clone https://github.com/vitusbenson/carbonbench.git
git clone https://github.com/vitusbenson/neural_transport.git
cd carbonbench

# Install requirements (including neural_transport)
pip install -e ../neural_transport

# Create CarbonTracker-based dataset (example LowRes)
python data/create_carbontracker_dataset.py \
  --save_dir /path/to/data \
  --gridname "latlon5.625" \
  --vertical_levels "l10" \
  --freq "6h"

# Create ObsPack evaluation dataset
python data/create_obspack_dataset.py \
  --save_dir /path/to/data \
  --freq "3h"

Citation

@article{benson2024neuraltransport,
  title = {Atmospheric Transport Modeling of CO2 with Neural Networks},
  author = {Benson, Vitus and Bastos, Ana and Reimers, Christian and Winkler, Alexander J. and Yang, Fanny and Reichstein, Markus},
  year = {2025},
  pages = {e2024MS004655},
  journal = {Journal of Advances in Modeling Earth Systems},
  volume = {17},
  number = {2},
  publisher = {Wiley Online Library},
  url = {https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2024MS004655},
}