SST Validation Workflows with ValidOcean¶

Description:¶

This tutorial notebook introduces a typical sea surface temperature validation workflow in ValidOcean using the ModelValidator class.

To demonstrate a typical sea surface temperature validation workflow, we will use outputs from the National Oceanography Centre's Near-Present-Day global 1-degree (eORCA1-NPD-ERA5v1) ocean sea-ice simulation introduced in Example 1, which represents the historical period from 1976 - present.

For more details on this model configuration and the available outputs, users should explore the Near-Present-Day documentation here.

Contact:¶

Ollie Tooth (oliver.tooth@noc.ac.uk)

In [1]:

Copied!

# -- Import required libraries -- #
import xarray as xr
from ValidOcean import ModelValidator
# -- Import required libraries -- #
import xarray as xr
from ValidOcean import ModelValidator

/home/otooth/miniconda3/envs/env_nemo_validation/lib/python3.12/site-packages/esmpy/interface/loadESMF.py:94: VersionWarning: ESMF installation version 8.8.0, ESMPy version 8.8.0b0
  warnings.warn("ESMF installation version {}, ESMPy version {}".format(

Accessing & Preparing Example Ocean Model Data¶

We will begin by using the xarray Python library to load an example dataset from the eORCA1-NPD-ERA5v1 outputs available on the JASMIN Object Store.

As in Example 1, we will load the NEMO ocean model domain variables and then we will load monthly mean sea surface temperatures for the global domain.

Before creating a ModelValidator() object, we need to update our ocean model coordinate variables to conform to the standard names used by ValidOcean & add a land/ocean mask file for regridding.

In [2]:

Copied!





# Define url to domain store:
url_domain = "https://noc-msm-o.s3-ext.jc.rl.ac.uk/npd-eorca1-era5v1/domain/"
ds_domain = xr.open_zarr(url_domain, consolidated=True)

# Define url to sea surface temperature (tos_con) store:
url_tos_con = "https://noc-msm-o.s3-ext.jc.rl.ac.uk/npd-eorca1-era5v1/T1m/tos_con/"
ds_tos_con = xr.open_zarr(url_tos_con, consolidated=True)

# Updating model coords:
ds_tos_con = ds_tos_con.rename({'nav_lat': 'lat', 'nav_lon': 'lon', 'time_counter': 'time'})
# Adding a 2-D land/ocean mask to our Dataset:
ds_tos_con['mask'] = ds_domain['tmaskutil'].squeeze()

ds_tos_con
# Define url to domain store:
url_domain = "https://noc-msm-o.s3-ext.jc.rl.ac.uk/npd-eorca1-era5v1/domain/"
ds_domain = xr.open_zarr(url_domain, consolidated=True)

# Define url to sea surface temperature (tos_con) store:
url_tos_con = "https://noc-msm-o.s3-ext.jc.rl.ac.uk/npd-eorca1-era5v1/T1m/tos_con/"
ds_tos_con = xr.open_zarr(url_tos_con, consolidated=True)

# Updating model coords:
ds_tos_con = ds_tos_con.rename({'nav_lat': 'lat', 'nav_lon': 'lon', 'time_counter': 'time'})
# Adding a 2-D land/ocean mask to our Dataset:
ds_tos_con['mask'] = ds_domain['tmaskutil'].squeeze()

ds_tos_con

Out[2]:

<xarray.Dataset> Size: 281MB
Dimensions:        (y: 331, x: 360, time: 585)
Coordinates:
    lat            (y, x) float64 953kB dask.array<chunksize=(331, 360), meta=np.ndarray>
    lon            (y, x) float64 953kB dask.array<chunksize=(331, 360), meta=np.ndarray>
    time_centered  (time) datetime64[ns] 5kB dask.array<chunksize=(1,), meta=np.ndarray>
  * time           (time) datetime64[ns] 5kB 1976-01-16T12:00:00 ... 2024-09-16
    time_counter   float32 4B 0.0
Dimensions without coordinates: y, x
Data variables:
    tos_con        (time, y, x) float32 279MB dask.array<chunksize=(1, 331, 360), meta=np.ndarray>
    mask           (y, x) int8 119kB dask.array<chunksize=(331, 360), meta=np.ndarray>

Creating A ModelValidator¶

Now let's create a new ModelValidator() object using our monthly sea surface temperature data.

In [ ]:

Copied!

# Creating a new ModelValidator object:
mv = ModelValidator(mdl_data=ds_tos_con)
mv
# Creating a new ModelValidator object:
mv = ModelValidator(mdl_data=ds_tos_con)
mv

Out[ ]:

<ModelValidator>

-- Model Data --

<xarray.Dataset> Size: 281MB
Dimensions:        (y: 331, x: 360, time: 585)
Coordinates:
    lat            (y, x) float64 953kB dask.array<chunksize=(331, 360), meta=np.ndarray>
    lon            (y, x) float64 953kB dask.array<chunksize=(331, 360), meta=np.ndarray>
    time_centered  (time) datetime64[ns] 5kB dask.array<chunksize=(1,), meta=np.ndarray>
  * time           (time) datetime64[ns] 5kB 1976-01-16T12:00:00 ... 2024-09-16
    time_counter   float32 4B 0.0
Dimensions without coordinates: y, x
Data variables:
    tos_con        (time, y, x) float32 279MB dask.array<chunksize=(1, 331, 360), meta=np.ndarray>
    mask           (y, x) int8 119kB dask.array<chunksize=(331, 360), meta=np.ndarray>

-- Observations --

<xarray.Dataset> Size: 0B
Dimensions:  ()
Data variables:
    *empty*

-- Results --

<xarray.Dataset> Size: 0B
Dimensions:  ()
Data variables:
    *empty*

-- Stats --

<xarray.Dataset> Size: 0B
Dimensions:  ()
Data variables:
    *empty*

Plotting (Model - Observations) Sea Surface Temperature Bias¶

Consider the typical scenario where we have just completed a ocean model simulation and would now like to know how well the surface properties in our hindcast represent the real ocean. A common starting point is to calculate the difference or bias between the climatologies of sea surface temperature (SST) simulated in the model and recorded in observations.

We can divide this validation workflow into several steps:

Select & load our ocean observation dataset

We will use the 1991-2020 climatology of the NOAA High-resolution Blended Analysis of Daily SST (OISSTv2) product.

Load & subset our ocean model dataset

Only a temporal subsetting is needed to match the observations since OISSTv2 has global coverage.

Calculate the SST climatology from our ocean model dataset.
Regrid either the model climatology onto the observations grid or vice versa

We regrid the observations climatology onto the model grid using bilinar interpolation.

Calculate the (model - observation) error & aggregated statistics

We return the MAE, MSE and RMSE.

Plot the (model - observation) SST error

We also include the model & observation SST climatologies.

The true value of ValidOcean is that all of the steps are perfomed for you by using the plot_sst_error() method of the ModelValidator...

In [4]:

Copied!





mv.plot_sst_error(sst_name='tos_con',
                  obs_name='OISSTv2',
                  time_bounds='1991-2020',
                  freq='total',
                  regrid_to='model',
                  method='bilinear',
                  source_plots=True,
                  stats=True,
                  )
mv.plot_sst_error(sst_name='tos_con',
                  obs_name='OISSTv2',
                  time_bounds='1991-2020',
                  freq='total',
                  regrid_to='model',
                  method='bilinear',
                  source_plots=True,
                  stats=True,
                  )

Out[4]:

array([<GeoAxes: >, <GeoAxes: >, <GeoAxes: >], dtype=object)

No description has been provided for this image

Results of our Validation Workflow¶

In addition to the plot shown above, calling the plot_sst_error() method has populated the .obs, .results and .stats attributes with the results of our validation workflow.

Let's start by exploring the observational data which has been added to the .obs attribute.

Here, we can see that a new DataArray tos_con_oisstv2 has been added to the observations dataset as a result of regridding the OISSTv2 (1991-2020) climatology onto our original model grid using bilinear interpolation (shown in panel (b) above).

In [5]:

Copied!

mv.obs
mv.obs

Out[5]:

<xarray.Dataset> Size: 2MB
Dimensions:               (y: 331, x: 360)
Coordinates:
    lat_oisstv2           (y, x) float64 953kB dask.array<chunksize=(331, 360), meta=np.ndarray>
    lon_oisstv2           (y, x) float64 953kB dask.array<chunksize=(331, 360), meta=np.ndarray>
    time_counter_oisstv2  float32 4B 0.0
Dimensions without coordinates: y, x
Data variables:
    tos_con_oisstv2       (y, x) float32 477kB dask.array<chunksize=(331, 360), meta=np.ndarray>

We can see that the model - observation SST error (tos_con_error shown in panel (c) above) has been included alongside the original ocean model SST tos_con in the .results attribute.

In [6]:

Copied!

mv.results
mv.results

Out[6]:

<xarray.Dataset> Size: 3MB
Dimensions:        (obs: 1, y: 331, x: 360)
Coordinates:
  * obs            (obs) object 8B 'OISSTv2'
    lat            (y, x) float64 953kB dask.array<chunksize=(331, 360), meta=np.ndarray>
    lon            (y, x) float64 953kB dask.array<chunksize=(331, 360), meta=np.ndarray>
    time_counter   float32 4B 0.0
Dimensions without coordinates: y, x
Data variables:
    tos_con_error  (obs, y, x) float32 477kB dask.array<chunksize=(1, 331, 360), meta=np.ndarray>
    tos_con        (y, x) float32 477kB dask.array<chunksize=(331, 360), meta=np.ndarray>

Finally, we can also compute the aggregate statistics quantifying the error between our ocean model & the OISSTv2 SST climatology by calling .compute() on the .stats attribute.

Validation Without Visualisation¶

To perform our SST validation workflow without visualising the results, we can alternatively use the .compute_sst_error() method, which returns all of the results discussed above without the matplotlib axes object.

To demonstrate this method, we will create a new ModelValidator object using a regional North Atlantic subdomain of our ocean model dataset and perform a seasonal SST validation workflow as follows...

In [8]:

Copied!

# Creating a regional ModelValidator object:
mv_regional = ModelValidator(mdl_data=ds_tos_con.isel(x=slice(200, 300), y=slice(220, None)))
# Creating a regional ModelValidator object:
mv_regional = ModelValidator(mdl_data=ds_tos_con.isel(x=slice(200, 300), y=slice(220, None)))

In [9]:

Copied!





mv_regional.compute_sst_error(sst_name='tos_con',
                              obs_name='OISSTv2',
                              time_bounds='1991-2020',
                              freq='seasonal',
                              regrid_to='model',
                              method='bilinear',
                              stats=True,
                              )
mv_regional.compute_sst_error(sst_name='tos_con',
                              obs_name='OISSTv2',
                              time_bounds='1991-2020',
                              freq='seasonal',
                              regrid_to='model',
                              method='bilinear',
                              stats=True,
                              )

Out[9]:

<ModelValidator>

-- Model Data --

<xarray.Dataset> Size: 26MB
Dimensions:        (y: 111, x: 100, time: 585)
Coordinates:
    lat            (y, x) float64 89kB dask.array<chunksize=(111, 100), meta=np.ndarray>
    lon            (y, x) float64 89kB dask.array<chunksize=(111, 100), meta=np.ndarray>
    time_centered  (time) datetime64[ns] 5kB dask.array<chunksize=(1,), meta=np.ndarray>
  * time           (time) datetime64[ns] 5kB 1976-01-16T12:00:00 ... 2024-09-16
    time_counter   float32 4B 0.0
Dimensions without coordinates: y, x
Data variables:
    tos_con        (time, y, x) float32 26MB dask.array<chunksize=(1, 111, 100), meta=np.ndarray>
    mask           (y, x) int8 11kB dask.array<chunksize=(111, 100), meta=np.ndarray>

-- Observations --

<xarray.Dataset> Size: 355kB
Dimensions:               (season_oisstv2: 4, y: 111, x: 100)
Coordinates:
  * season_oisstv2        (season_oisstv2) object 32B 'DJF' 'JJA' 'MAM' 'SON'
    lat_oisstv2           (y, x) float64 89kB dask.array<chunksize=(111, 100), meta=np.ndarray>
    lon_oisstv2           (y, x) float64 89kB dask.array<chunksize=(111, 100), meta=np.ndarray>
    time_counter_oisstv2  float32 4B 0.0
Dimensions without coordinates: y, x
Data variables:
    tos_con_oisstv2       (season_oisstv2, y, x) float32 178kB dask.array<chunksize=(4, 111, 100), meta=np.ndarray>

-- Results --

<xarray.Dataset> Size: 533kB
Dimensions:        (obs: 1, y: 111, x: 100, season: 4)
Coordinates:
  * obs            (obs) object 8B 'OISSTv2'
    lat            (y, x) float64 89kB dask.array<chunksize=(111, 100), meta=np.ndarray>
    lon            (y, x) float64 89kB dask.array<chunksize=(111, 100), meta=np.ndarray>
    time_counter   float32 4B 0.0
  * season         (season) object 32B 'DJF' 'JJA' 'MAM' 'SON'
Dimensions without coordinates: y, x
Data variables:
    tos_con_error  (obs, season, y, x) float32 178kB dask.array<chunksize=(1, 1, 111, 100), meta=np.ndarray>
    tos_con        (season, y, x) float32 178kB dask.array<chunksize=(1, 111, 100), meta=np.ndarray>

-- Stats --

<xarray.Dataset> Size: 84B
Dimensions:                 (season: 4)
Coordinates:
    time_counter            float32 4B 0.0
  * season                  (season) object 32B 'DJF' 'JJA' 'MAM' 'SON'
Data variables:
    Mean Absolute Error     (season) float32 16B dask.array<chunksize=(1,), meta=np.ndarray>
    Mean Square Error       (season) float32 16B dask.array<chunksize=(1,), meta=np.ndarray>
    Root Mean Square Error  (season) float32 16B dask.array<chunksize=(1,), meta=np.ndarray>

If we did want to visualise the results of our validation workflow, we could modify our previous example of the plot_sst_error method as follows..

In [10]:

Copied!





mv_regional.plot_sst_error(sst_name='tos_con',
                           obs_name='OISSTv2',
                           time_bounds='1991-2020',
                           freq='seasonal',
                           regrid_to='model',
                           method='bilinear',
                           stats=True,
                           )
mv_regional.plot_sst_error(sst_name='tos_con',
                           obs_name='OISSTv2',
                           time_bounds='1991-2020',
                           freq='seasonal',
                           regrid_to='model',
                           method='bilinear',
                           stats=True,
                           )

/dssgfs01/working/otooth/Diagnostics/ValidOcean/ValidOcean/plotting.py:77: RuntimeWarning: source_plots = True is only available where freq = 'total'. Using source_plots = False.
  warnings.warn("source_plots = True is only available where freq = 'total'. Using source_plots = False.", RuntimeWarning)

Out[10]:

array([<GeoAxes: >, <GeoAxes: >, <GeoAxes: >, <GeoAxes: >], dtype=object)

Next Steps...¶

In this tutorial, we have seen how to perform a typical ocean model validation worfklow for sea surface temperature data using the ModelValidator object.

Next, we will explore how to use the ModelValidator to validate sea ice data, including aggregated diagnostics (e.g., sea ice area), in the ex3_sea_ice_validation_workflow.ipynb notebook.