Outputs
Summary
This page provides information on the ocean & sea-ice outputs made available from the Near-Present-Day simulations at 5-day, monthly and annual frequency & how to access them using the JASMIN Object Store.
Primary Outputs
Primary outputs of the Near-Present-Day simulations are those variables which are calculated online at runtime and are written to netCDF files according to where they are defined on the eORCA grid.
Example: eORCA1
The conservative temperature thetao_con
averaged at monthly intervals will be stored in the eORCA1_1m_YYYYMM_grid_T.nc
file.
Below we include a table of the available ocean and sea-ice variables output by each Near-Present-Day simulation:
Available Ocean & Sea-Ice Outputs:
- ERA-5 Adjusted (1976-present)
- JRA55-do (1976-2023)
Secondary Outputs
Secondary outputs of the Near-Present-Day simulations include those diagnostics which are calculated offline using the primary output variables. In many cases, these outputs will be produced during the analysis of the Near-Present-Day simulations, however, a collection of especially popular diagnostics will be made available to users.
Atlantic Meridional Overturning Circulation Diagnostics
The Atlantic Meridional Overturning Circulation (AMOC) is a fundamental component of the global climate system owing to its role in the redistribution of heat, nutrients and freshwater. On account of its wider societal significance, a number of continuous ocean observing systems have been deployed throughout the Atlantic Ocean to monitor the state and variability of the AMOC.
The METRIC Python package allows users to calculate meridional overturning and heat transport diagnostics in numerical models which are equivalent (and hence comparable) to existing observations at the RAPID (26.5\(^{\circ}\)N), MOVE (16\(^{\circ}\)N) and SAMBA (34.5\(^{\circ}\)S) (see Danabasoglu et al., 2021).
Diagnostics including meridional overturning stream functions and the meridional fluxes of heat and freshwater will be made available as secondary output variables via the JASMIN Object Store.
Accessing Near-Present-Day Data via the JASMIN Object Store.
To improve the accessibility of the large volumes of data generated by the Near-Present-Day simulations, primary and secondary output variables will be made available via the JASMIN Object Store. For those who are unfamiliar with object storage, we suggest reading the primer below before getting started accessing the available outputs.
What is Object Storage?
Object storage is a fairly modern data storage solution that provides an efficient, scalable and collaborative way to store and manage large volumes of scientific data.
Although most of us are accustomed to working with traditional hierarchical file systems (think folders and file paths), in object storage data is stored as objects in "buckets" rather than folders. Each object consists of
- Data For example, a sea surface temperature dataset.
- Metadata Descriptive information about the data.
- Unique Identifier Used to retrieve the object.
Object storage systems have two especially valuable properties for ocean-climate applications:
- Scalability: Since object stores employ a flat storage architecture, they can easily handle petabytes - exabytes of data.
- Accessibility: Since objects are accessed via Application Programming Interfaces (APIs), users can easily retrieve and analyse data from anywhere over HTTP, with authentication using HTTP headers.
Object stores are generally considered an efficient and cost-effective way to store and access data from the cloud, and are available from all the major cloud service providers (e.g., Amazon Web Services, Azure).
Introduction to the JASMIN Object Store
JASMIN is the UK's data analysis facility for environmental science, providing storage and compute facilities to enable data-intensive ocean-climate research. One such storage facility is the JASMIN object store. The JASMIN object store is S3 compatible for those already familiar with AWS.
The JASMIN object store is organised into tenancies (equivalent to Group Workspaces for those already familiar with JASMIN). Outputs from the Near-Present-Day simulations are stored in the noc-msm-o
tenancy using the following structure:
NPD JRA55-do v1
Outputs available for the Near-Present-Day simulations using the JRA55-do atmospheric forcing dataset (1976-2024) are stored in Zarr stores (see below) in the npd-eorca1-jra55v1
and npd-eorca025-jra55v1
buckets, which correspond to the eORCA1 and eORCA025 model configurations, respectively. Each Zarr store is accessible (read-only) over HTTP using the URL prefix https://noc-msm-o.s3-ext.jc.rl.ac.uk
.
---
title: JRA55-do v1 Near-Present-Day Outputs available via JASMIN Object Storage
config:
layout: elk
look: handDrawn
theme: neutral
---
graph LR
subgraph npd-eorca1-jra55v1 [eORCA1]
eORCA1.T1m[T1m]
eORCA1.U1m[U1m]
eORCA1.V1m[V1m]
eORCA1.T1y[T1y]
eORCA1.U1y[U1y]
eORCA1.V1y[V1y]
end
subgraph npd-eorca025-jra55v1 [eORCA025]
eORCA025.T5d[T5d]
eORCA025.U5d[U5d]
eORCA025.V5d[V5d]
eORCA025.T1m[T1m]
eORCA025.U1m[U1m]
eORCA025.V1m[V1m]
eORCA025.T1y[T1y]
eORCA025.U1y[U1y]
eORCA025.V1y[V1y]
end
subgraph npd-eorca12-jra55v1 [eORCA12]
eORCA12.T5d[T5d]
eORCA12.U5d[U5d]
eORCA12.V5d[V5d]
eORCA12.T1m[T1m]
eORCA12.U1m[U1m]
eORCA12.V1m[V1m]
eORCA12.T1y[T1y]
eORCA12.U1y[U1y]
eORCA12.V1y[V1y]
end
A[noc-msm-o.s3-ext.jc.rl.ac.uk] --> B[npd-eorca1-jra55v1]
B --> npd-eorca1-jra55v1
A --> C[npd-eorca025-jra55v1]
C --> npd-eorca025-jra55v1
A --> D[npd-eorca12-jra55v1]
D --> npd-eorca12-jra55v1
Note
The URL prefix provided above is for users seeking to access Near-Present-Day outputs from the JASMIN External Cloud and locations external to JASMIN.
From inside JASMIN, including LOTUS compute nodes and Scientific Analysis servers the URL prefix https://noc-msm-o.s3.jc.rl.ac.uk
can be used.
Within a given bucket, output data available depends on the chosen model configuration.
For the coarse resolution eORCA1 configuration, output variables, available as monthly and annual means, are stored in 'sub-buckets' (note that, in reality, these are simply prefixes used to identify each variable) determined according to the grid point where the variable is defined. For example, the npd-eorca1-jra55v1/T1m
'sub-bucket' contains all the monthly-mean output variables stored at T-grid points.
For the finer resolution eORCA025 and eORCA12 configurations, output variables are available as 5-day, monthly and annual means. The monthly and annual mean output variables are stored analogously to the eORCA1 configuration described above. 5-day mean output variables are stored in 'sub-buckets' according to their year and their location on the NEMO model grid. For example, the npd-eorca025-jra55v1/T5d/1976
'sub-bucket' contains all of the 5-day mean output variables stored at T-grid points during 1976.
For more information on how variables are defined on the eORCA grid, users are referred to the NEMO documentation.
---
title: Example eORCA025 5-day mean outputs available via JASMIN Object Storage
config:
layout: elk
look: handDrawn
theme: neutral
---
graph TB
subgraph 1976_T5d [1976]
1976_T5d.thetao_con[thetao_con]
1976_T5d.so_abs[so_abs]
end
subgraph 1976_U5d [1976]
1976_U5d.uo[uo]
1976_U5d.uos[uos]
end
subgraph 1976_V5d [1976]
1976_V5d.vo[vo]
1976_V5d.vos[vos]
end
subgraph 1977_T5d [1977]
1977_T5d.thetao_con[thetao_con]
1977_T5d.so_abs[so_abs]
end
subgraph 1977_U5d [1977]
1977_U5d.uo[uo]
1977_U5d.uos[uos]
end
subgraph 1977_V5d [1977]
1977_V5d.vo[vo]
1977_V5d.vos[vos]
end
A[npd-eorca025-jra55v1] --> T5d
A[npd-eorca025-jra55v1] --> U5d
A[npd-eorca025-jra55v1] --> V5d
T5d --> 1976_T5d
T5d --> 1977_T5d
U5d --> 1976_U5d
U5d --> 1977_U5d
V5d --> 1976_V5d
V5d --> 1977_V5d
NPD ERA-5 v1
Outputs available for the Near-Present-Day simulations using a climatologically adjusted version of the ERA-5 atmospheric forcing dataset (1976-present) are stored in Icechunk repositories (see below) in the npd-eorca1-era5v1
, npd-eorca025-era5v1
and npd-eorca12-era5v1
buckets, which correspond to the eORCA1, eORCA025 and eORCA12 model configurations, respectively.
---
title: ERA-5 v1 Near-Present-Day Outputs available via JASMIN Object Storage
config:
layout: elk
look: handDrawn
theme: neutral
---
graph LR
subgraph npd-eorca1-era5v1 [eORCA1]
eORCA1.T1m[T1m]
eORCA1.U1m[U1m]
eORCA1.V1m[V1m]
eORCA1.W1m[W1m]
eORCA1.I1m[I1m]
eORCA1.S1m[S1m]
eORCA1.T1y[T1y]
eORCA1.U1y[U1y]
eORCA1.V1y[V1y]
eORCA1.W1y[W1y]
eORCA1.I1y[I1y]
eORCA1.S1y[S1y]
end
subgraph npd-eorca025-era5v1 [eORCA025]
eORCA025.T5d_3d[T5d_3d]
eORCA025.U5d_3d[U5d_3d]
eORCA025.V5d_3d[V5d_3d]
eORCA025.T5d_4d[T5d_4d]
eORCA025.U5d_4d[U5d_4d]
eORCA025.V5d_4d[V5d_4d]
eORCA025.W5d_4d[W5d_4d]
eORCA025.I5d_3d[I5d_3d]
eORCA025.S5d_1d[S5d_1d]
eORCA025.T1m_3d[T1m_3d]
eORCA025.U1m_3d[U1m_3d]
eORCA025.V1m_3d[V1m_3d]
eORCA025.T1m_4d[T1m_4d]
eORCA025.U1m_4d[U1m_4d]
eORCA025.V1m_4d[V1m_4d]
eORCA025.W1m_4d[W1m_4d]
eORCA025.I1m_3d[U1m_3d]
eORCA025.S1m_1d[S1m_1d]
eORCA025.T1y_3d[T1y_3d]
eORCA025.U1y_3d[U1y_3d]
eORCA025.V1y_3d[V1y_3d]
eORCA025.T1y_4d[T1y_4d]
eORCA025.U1y_4d[U1y_4d]
eORCA025.V1y_4d[V1y_4d]
eORCA025.W1y_4d[W1y_4d]
eORCA025.I1y_3d[I1y_3d]
eORCA025.S1y_1d[S1y_1d]
end
subgraph npd-eorca12-era5v1 [eORCA12]
eORCA12.T5d_3d[T5d_3d]
eORCA12.U5d_3d[U5d_3d]
eORCA12.V5d_3d[V5d_3d]
eORCA12.T5d_4d[T5d_4d]
eORCA12.U5d_4d[U5d_4d]
eORCA12.V5d_4d[V5d_4d]
eORCA12.W5d_4d[W5d_4d]
eORCA12.I5d_3d[I5d_3d]
eORCA12.S5d_1d[S5d_1d]
eORCA12.T1m_3d[T1m_3d]
eORCA12.U1m_3d[U1m_3d]
eORCA12.V1m_3d[V1m_3d]
eORCA12.T1m_4d[T1m_4d]
eORCA12.U1m_4d[U1m_4d]
eORCA12.V1m_4d[V1m_4d]
eORCA12.W1m_4d[W1m_4d]
eORCA12.I1m_3d[U1m_3d]
eORCA12.S1m_1d[S1m_1d]
eORCA12.T1y_3d[T1y_3d]
eORCA12.U1y_3d[U1y_3d]
eORCA12.V1y_3d[V1y_3d]
eORCA12.T1y_4d[T1y_4d]
eORCA12.U1y_4d[U1y_4d]
eORCA12.V1y_4d[V1y_4d]
eORCA12.W1y_4d[W1y_4d]
eORCA12.I1y_3d[I1y_3d]
eORCA12.S1y_1d[S1y_1d]
end
A[noc-msm-o.s3-ext.jc.rl.ac.uk] --> B[npd-eorca1-era5v1]
B --> npd-eorca1-era5v1
A --> C[npd-eorca025-era5v1]
C --> npd-eorca025-era5v1
A --> D[npd-eorca12-era5v1]
D --> npd-eorca12-era5v1
Using the JASMIN Object Store
Now we have seen how the outputs of the Near-Present-Day simulations are structured within the JASMIN object store, our next step is accessing this data from a local or remote machine.
Although many users will be more familiar with analysing ocean-climate data via netCDF files, output variables generated by the Near-Present-Day simulations are stored in Analysis-Ready Cloud-Optimised (ARCO) Zarr stores and Icechunk repositories.
A Brief Introduction to Zarr
Zarr is an open source, flexible and efficient storage format designed for chunked, compressed, N-dimensional arrays. At its simplest, Zarr can be considered a cloud-native alternative to netCDF files since it consists of binary data files (chunks) accompanied by external metadata files.
One important difference between archival file formats (e.g., netCDF) and Zarr is that there is no single Zarr file. Instead, a Zarr store (typically given the suffix .zarr - although this is not a requirement) is a directory containing chunks of data stored in compressed binary files and JSON metadata files containing the array configuration and compression used.
Zarr works especially well in combination with cloud storage, such as the JASMIN object store, given that users can access data concurrently from multiple threads or processes using Python or a number of other programming languages.
Click here more information on the Zarr specification.
A Brief Introduction to Icechunk
Icechunk is an open-source, cloud-native transactional tensor storage engine designed for N-dimensional data in cloud object storage. At its simplest, Icechunk can be considered a "transactional storage engine for Zarr", meaning that Icechunk manages all of the I/O for reading, writing and updating metadata and chunk data & keeps track of changes (referred to as transactions) to the store in the form of snapshots.
In place of Zarr store, users create an Icechunk repository, which functions as both a self-contained Zarr store and a database of the snapshots resulting from transactions (e.g., updating values or writing new values in the store).
This allows Icechunk repositories to support data version control, since users can time-travel to previous snapshots of a repository.
Click here for an overview of Icechunk.
Method 1: Accessing Icechunk Repositories using the OceanDataStore library:
The simplest way to access ERA-5 Near-Present-Day simulation outputs is to use the OceanDataStore Python library designed to streamline accessing ocean model outputs stored in cloud object storage. To learn more about OceanDataStore click here.
Here, we will provide an example of using the OceanDataCatalog API to access outputs from the eORCA1-ERA5v1 Near-Present-Day configuration:
# Create and Activate a new Python virtual environment:
source /path/to/my/venv/bin/activate
# Install OceanDataStore from GitHub:
pip install git+https://github.com/NOC-MSM/OceanDataStore.git
Now, in a Python script or Jupyter / Marimo Notebook, we will access the annual mean sea surface temperature tos_con
(1976-present) data:
# Import required Python packages:
from OceanDataStore import OceanDataCatalog
# Create instance of OceanDataCatalog to access National Oceanography Centre Spatio-Temporal Access Catalog:
catalog = OceanDataCatalog(catalog_name="noc-model-stac")
# Search for sea surface conservative temperature (SST) outputs:
catalog.search(collection='noc-npd', variable='tos_con')
# Let's access the SST variables first ID, corresponding to the eORCA1 ERA-5v1 simulation as an xarray Dataset:
ds = catalog.open_dataset(id=catalog.Items[0].id,
variables=['tos_con'],
)
Method 2a: Accessing Zarr Stores Directly via URL
The simplest way to access JRA55-do v1 Near-Present-Day simulation outputs is to use the URLs included in the Available Ocean & Sea-Ice Outputs in combination with xarray - a Python package for working with labelled multi-dimensional arrays. Here, we will provide an example of accessing the annual mean sea surface temperature tos_con
dataset (1976-2023) output by the eORCA1-JRA55v1 Near-Present-Day configuration:
# Import required Python packages:
import xarray as xr
# Define path to sea surface temperature dataset in the JASMIN object store:
sst_url = "https://noc-msm-o.s3-ext.jc.rl.ac.uk/npd-eorca1-jra55v1/T1y/tos_con"
# Open sea surface conservative temperature (C) dataset with xarray:
tos_con = xr.open_zarr(sst_url, consolidated=True, chunks={}) # (1)
Here,
consolidated=True
means open the store using zarr’s consolidated metadata capability andchunks={}
means load the data with dask using engine preferred chunks. See the xarray documentation for more details.