Outputs

Summary

This page provides information on the ocean & sea-ice outputs made available from the Near-Present-Day simulations at 5-day, monthly and annual frequency & how to access them using the JASMIN Object Store.

Primary Outputs

Primary outputs of the Near-Present-Day simulations are those variables which are calculated online at runtime and are written to netCDF files according to where they are defined on the eORCA grid.

Example: eORCA1

The conservative temperature thetao_con averaged at monthly intervals will be stored in the eORCA1_1m_YYYYMM_grid_T.nc file.

Below we include a table of the available ocean and sea-ice variables output by each Near-Present-Day simulation:

Available Ocean & Sea-Ice Outputs:

ERA-5 Adjusted (1976-present)

JRA55-do (1976-2023)

Secondary Outputs

Secondary outputs of the Near-Present-Day simulations include those diagnostics which are calculated offline using the primary output variables. In many cases, these outputs will be produced during the analysis of the Near-Present-Day simulations, however, a collection of especially popular diagnostics will be made available to users.

Atlantic Meridional Overturning Circulation Diagnostics

The Atlantic Meridional Overturning Circulation (AMOC) is a fundamental component of the global climate system owing to its role in the redistribution of heat, nutrients and freshwater. On account of its wider societal significance, a number of continuous ocean observing systems have been deployed throughout the Atlantic Ocean to monitor the state and variability of the AMOC.

The METRIC Python package allows users to calculate meridional overturning and heat transport diagnostics in numerical models which are equivalent (and hence comparable) to existing observations at the RAPID (26.5\(^{\circ}\)N), MOVE (16\(^{\circ}\)N) and SAMBA (34.5\(^{\circ}\)S) (see Danabasoglu et al., 2021).

Diagnostics including meridional overturning stream functions and the meridional fluxes of heat and freshwater will be made available as secondary output variables via the JASMIN Object Store.

Accessing Near-Present-Day Data via the JASMIN Object Store.

To improve the accessibility of the large volumes of data generated by the Near-Present-Day simulations, primary and secondary output variables will be made available via the JASMIN Object Store. For those who are unfamiliar with object storage, we suggest reading the primer below before getting started accessing the available outputs.

What is Object Storage?

Object storage is a fairly modern data storage solution that provides an efficient, scalable and collaborative way to store and manage large volumes of scientific data.

Although most of us are accustomed to working with traditional hierarchical file systems (think folders and file paths), in object storage data is stored as objects in "buckets" rather than folders. Each object consists of

Data For example, a sea surface temperature dataset.
Metadata Descriptive information about the data.
Unique Identifier Used to retrieve the object.

Object storage systems have two especially valuable properties for ocean-climate applications:

Scalability: Since object stores employ a flat storage architecture, they can easily handle petabytes - exabytes of data.
Accessibility: Since objects are accessed via Application Programming Interfaces (APIs), users can easily retrieve and analyse data from anywhere over HTTP, with authentication using HTTP headers.

Object stores are generally considered an efficient and cost-effective way to store and access data from the cloud, and are available from all the major cloud service providers (e.g., Amazon Web Services, Azure).

Introduction to the JASMIN Object Store

JASMIN is the UK's data analysis facility for environmental science, providing storage and compute facilities to enable data-intensive ocean-climate research. One such storage facility is the JASMIN object store. The JASMIN object store is S3 compatible for those already familiar with AWS.

The JASMIN object store is organised into tenancies (equivalent to Group Workspaces for those already familiar with JASMIN). Outputs from the Near-Present-Day simulations are stored in the noc-msm-o tenancy using the following structure:

---
title: Near-Present-Day Outputs available via JASMIN Object Storage
config:
  layout: elk
  look: handDrawn
  theme: neutral
---
graph LR
  subgraph npd-eorca1-jra55v1 [eORCA1]
      eORCA1.T1m[T1m]
      eORCA1.U1m[U1m]
      eORCA1.V1m[V1m]
      eORCA1.T1y[T1y]
      eORCA1.U1y[U1y]
      eORCA1.V1y[V1y]
     end
  subgraph npd-eorca025-jra55v1 [eORCA025]
      eORCA025.T5d[T5d]
      eORCA025.U5d[U5d]
      eORCA025.V5d[V5d]
      eORCA025.T1m[T1m]
      eORCA025.U1m[U1m]
      eORCA025.V1m[V1m]
      eORCA025.T1y[T1y]
      eORCA025.U1y[U1y]
      eORCA025.V1y[V1y]
     end
  subgraph npd-eorca12-jra55v1 [eORCA12]
      eORCA12.T5d[T5d]
      eORCA12.U5d[U5d]
      eORCA12.V5d[V5d]
      eORCA12.T1m[T1m]
      eORCA12.U1m[U1m]
      eORCA12.V1m[V1m]
      eORCA12.T1y[T1y]
      eORCA12.U1y[U1y]
      eORCA12.V1y[V1y]
     end

    A[noc-msm-o.s3-ext.jc.rl.ac.uk] --> B[npd-eorca1-jra55v1]
    B --> npd-eorca1-jra55v1
    A --> C[npd-eorca025-jra55v1]
    C --> npd-eorca025-jra55v1
    A --> D[npd-eorca12-jra55v1]
    D --> npd-eorca12-jra55v1

In the diagram above, the outputs available for each Near-Present-Day simulation are stored in a separate bucket (e.g., npd-eorca1-jra55v1), which identifies the model configuration (eORCA1), the atmospheric forcing (JRA55-do) and the version of the simulation (v1.0). All of the data stored in Near-Present-Day simulation buckets is accessible (read-only) over HTTP using the URL prefix https://noc-msm-o.s3-ext.jc.rl.ac.uk.

Note

The URL prefix provided above is for users seeking to access Near-Preseent-Day outputs from the JASMIN External Cloud and locations external to JASMIN.

From inside JASMIN, including LOTUS compute nodes and Scientific Analysis servers the URL prefix https://noc-msm-o.s3.jc.rl.ac.uk should be used.

Within a given bucket, output data available depends on the chosen model configuration.

For the coarse resolution eORCA1 configuration, output variables, available as monthly and annual means, are stored in 'sub-buckets' (note that, in reality, these are simply prefixes used to identify each variable) determined according to the grid point where the variable is defined. For example, the npd-eorca1-jra55v1/T1m 'sub-bucket' contains all the monthly-mean output variables stored at T-grid points.

For the finer resolution eORCA025 and eORCA12 configurations, output variables are available as 5-day, monthly and annual means. The monthly and annual mean output variables are stored analogously to the eORCA1 configuration described above. 5-day mean output variables are stored in 'sub-buckets' according to their year and their location on the NEMO model grid. For example, the npd-eorca025-jra55v1/T5d/1976 'sub-bucket' contains all of the 5-day mean output variables stored at T-grid points during 1976.

For more information on how variables are defined on the eORCA grid, users are referred to the NEMO documentation.

---
title: Example eORCA025 5-day mean outputs available via JASMIN Object Storage
config:
  layout: elk
  look: handDrawn
  theme: neutral
---
graph TB
  subgraph 1976_T5d [1976]
      1976_T5d.thetao_con[thetao_con]
      1976_T5d.so_abs[so_abs]
     end
  subgraph 1976_U5d [1976]
      1976_U5d.uo[uo]
      1976_U5d.uos[uos]
     end
  subgraph 1976_V5d [1976]
      1976_V5d.vo[vo]
      1976_V5d.vos[vos]
     end
  subgraph 1977_T5d [1977]
      1977_T5d.thetao_con[thetao_con]
      1977_T5d.so_abs[so_abs]
     end
  subgraph 1977_U5d [1977]
      1977_U5d.uo[uo]
      1977_U5d.uos[uos]
     end
  subgraph 1977_V5d [1977]
      1977_V5d.vo[vo]
      1977_V5d.vos[vos]
     end

  A[npd-eorca025-jra55v1] --> T5d
  A[npd-eorca025-jra55v1] --> U5d
  A[npd-eorca025-jra55v1] --> V5d
  T5d --> 1976_T5d
  T5d --> 1977_T5d 
  U5d --> 1976_U5d
  U5d --> 1977_U5d
  V5d --> 1976_V5d
  V5d --> 1977_V5d

Using the JASMIN Object Store

Now we have seen how the outputs of the Near-Present-Day simulations are structured within the JASMIN object store, our next step is accessing this data from a local or remote machine.

Although many users will be more familiar with analysing ocean-climate data via netCDF files, output variables generated by the Near-Present-Day simulations are stored in the cloud-native Zarr file format.

A Brief Introduction to Zarr

Zarr is an open source, flexible and efficient storage format designed for chunked, compressed, N-dimensional arrays. At its simplest, Zarr can be considered a cloud-native alternative to netCDF files since it consists of binary data files (chunks) accompanied by external metadata files.

One important difference between archival file formats (e.g., netCDF) and Zarr is that there is no single Zarr file. Instead, a Zarr store (typically given the suffix .zarr - although this is not a requirement) is a directory containing chunks of data stored in compressed binary files and JSON metadata files containing the array configuration and compression used.

Zarr works especially well in combination with cloud storage, such as the JASMIN object store, given that users can access data concurrently from multiple threads or processes using Python or a number of other programming languages.

Click here more information on the Zarr specification.

Method 1: Accessing Data Directly via URL

The simplest way to access Near-Present-Day simulation outputs is to use the URLs included in the Available Ocean & Sea-Ice Outputs in combination with xarray - a Python package for working with labelled multi-dimensional arrays. Here, we will provide an example of accessing the annual mean sea surface temperature tos_con dataset (1976-2023) output by the eORCA1-JRA55v1 Near-Present-Day configuration:

Example: Accessing eORCA1-JRA55v1 Sea Surface Temperature via URL

# Import required Python packages:
import xarray as xr

# Define path to sea surface temperature dataset in the JASMIN object store:
sst_url = "https://noc-msm-o.s3-ext.jc.rl.ac.uk/npd-eorca1-jra55v1/T1y/tos_con"

# Open sea surface conservative temperature (C) dataset with xarray:
tos_con = xr.open_zarr(sst_url, consolidated=True, chunks={}) # (1)

Here, consolidated=True means open the store using zarr’s consolidated metadata capability and chunks={} means load the data with dask using engine preferred chunks. See the xarray documentation for more details.

Method 2: Accessing Data using the NPD Data Catalogs

Alternatively, users can explore the metadata & output variables from the Near-Present-Day simulations on their local machine using one of the NPD Data Catalogs. Below, we show how to access the same eORCA1-JRA55v1 annual mean sea surface temperature dataset using the NPD Data Catalog available via GitHub, which contains all of the NPD JRA55-do datasets available on the JASMIN Object Store:

Example: Accessing eORCA1-JRA55v1 Sea Surface Temperature using the NPD Data Catalog

# Import required Python packages:
import pandas as pd
import xarray as xr

# Defining the NPD JRA55-do Data Catalog URL:
catalog_url = "https://raw.githubusercontent.com/NOC-MSM/NOC_Near_Present_Day/main/jasmin_os/catalogs/npd_jra55_v1_catalog.csv"

# Read the data catalog using pandas:
catalog = pd.read_csv(catalog_url)

# Query the NPD JRA55-do data catalog:
sst_url = catalog.query("variable == 'tos_con' & freq == '1y' & model == 'eORCA1'")["url"].iloc[0]

# Open eORCA1-JRA55v1 annual mean SST data:
tos_con = xr.open_zarr(sst_url, consolidated=True, chunks={})

Searching the NPD Data Catalog

To explore the available Near-Present-Day outputs stored in the JASMIN Object Store, users can access the Data Catalogs (shown in Available Ocean & Sea-Ice Outputs) as a DataFrame on their local machine.

Once a chosen output variable is identified, users can specify the NEMO model grid type (grid), temporal frequency (freq) of the output, and the variable name (variable) to access the data as an xarray Dataset. The options for the grid & freq parameters are listed below:

grid = ["T", "U", "V", "W", "I", "S", "M"]

freq = ["5d", "1m", "1y"]

For users seeking to access Secondary outputs of the Near-Present-Day simulations (including AMOC diagnostics), the following options should be chosen: grid="M" and freq="1m".