Tutorial 5: Internal Climate Variability#

Week 2, Day 1, Future Climate: The Physical Basis

Content creators: Brodie Pearson, Julius Busecke, Tom Nicholas

Content reviewers: Younkap Nina Duplex, Zahra Khodakaramimaghsoud, Sloane Garelick, Peter Ohue, Jenna Pearson, Derick Temfack, Peizhen Yang, Cheng Zhang, Chi Zhang, Ohad Zivan

Content editors: Jenna Pearson, Ohad Zivan, Chi Zhang

Production editors: Wesley Banfield, Jenna Pearson, Chi Zhang, Ohad Zivan

Our 2023 Sponsors: NASA TOPS, Google DeepMind, and CMIP

Tutorial Objectives#

In this tutorial, we will learn about the concept of internal climate variability, how it influences the predictability of climate phenomena and how it contributes to uncertainty in CMIP6 models. We will work with a single-model ensemble, which utilizes the MPI-ESM1-2-LR model from CMIP6, to isolate and quantify internal climate variability.

By the end of this tutorial, you would be able to:

Understand the importance of internal climate variability and its role in climate prediction and model uncertainty.
Create and evaluate a single-model ensemble using IPCC uncertainty bands, providing a visual representation of model uncertainty.
Contrast the uncertainty due to internal variability against the uncertainty within a multi-model ensemble (which includes internal variability and the impacts of human/coding choices).

Setup#

# installations ( uncomment and run this cell ONLY when using google colab or kaggle )

# !pip install condacolab &> /dev/null
# import condacolab
# condacolab.install()

# # Install all packages in one call (+ use mamba instead of conda), this must in one line or code will fail
# !mamba install xarray-datatree intake-esm gcsfs xmip aiohttp nc-time-axis cf_xarray xarrayutils &> /dev/null

# imports
import time

tic = time.time()

import intake
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr

from xmip.preprocessing import combined_preprocessing
from xarrayutils.plotting import shaded_line_plot

from datatree import DataTree
from xmip.postprocessing import _parse_metric

# @title Figure settings
import ipywidgets as widgets  # interactive display

plt.style.use(
    "https://raw.githubusercontent.com/ClimateMatchAcademy/course-content/main/cma.mplstyle"
)

%matplotlib inline

# @title Helper functions


def global_mean(ds: xr.Dataset) -> xr.Dataset:
    """Global average, weighted by the cell area"""
    return ds.weighted(ds.areacello.fillna(0)).mean(["x", "y"], keep_attrs=True)


# Calculate anomaly to reference period
def datatree_anomaly(dt):
    dt_out = DataTree()
    for model, subtree in dt.items():
        # for the coding exercise, ellipses will go after sel on the following line
        ref = dt[model]["historical"].ds.sel(time=slice("1950", "1980")).mean()
        dt_out[model] = subtree - ref
    return dt_out


def plot_historical_ssp126_combined(dt):
    for model in dt.keys():
        datasets = []
        for experiment in ["historical", "ssp126"]:
            datasets.append(dt[model][experiment].ds.tos)

        da_combined = xr.concat(datasets, dim="time")

# @title Video 1: Internal Climate Variability

from ipywidgets import widgets
from IPython.display import YouTubeVideo
from IPython.display import IFrame
from IPython.display import display


class PlayVideo(IFrame):
    def __init__(self, id, source, page=1, width=400, height=300, **kwargs):
        self.id = id
        if source == "Bilibili":
            src = f"https://player.bilibili.com/player.html?bvid={id}&page={page}"
        elif source == "Osf":
            src = f"https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render"
        super(PlayVideo, self).__init__(src, width, height, **kwargs)


def display_videos(video_ids, W=400, H=300, fs=1):
    tab_contents = []
    for i, video_id in enumerate(video_ids):
        out = widgets.Output()
        with out:
            if video_ids[i][0] == "Youtube":
                video = YouTubeVideo(
                    id=video_ids[i][1], width=W, height=H, fs=fs, rel=0
                )
                print(f"Video available at https://youtube.com/watch?v={video.id}")
            else:
                video = PlayVideo(
                    id=video_ids[i][1],
                    source=video_ids[i][0],
                    width=W,
                    height=H,
                    fs=fs,
                    autoplay=False,
                )
                if video_ids[i][0] == "Bilibili":
                    print(
                        f"Video available at https://www.bilibili.com/video/{video.id}"
                    )
                elif video_ids[i][0] == "Osf":
                    print(f"Video available at https://osf.io/{video.id}")
            display(video)
        tab_contents.append(out)
    return tab_contents


video_ids = [("Youtube", "YcIAaljLRh4"), ("Bilibili", "BV1HF41197qn")]
tab_contents = display_videos(video_ids, W=730, H=410)
tabs = widgets.Tab()
tabs.children = tab_contents
for i in range(len(tab_contents)):
    tabs.set_title(i, video_ids[i][0])
display(tabs)

Section 1: Internal Climate Variability & Single-model Ensembles#

One of the CMIP6 models we are using in today’s tutorials, MPI-ESM1-2-LR, is part of single-model ensemble, where its modelling centre carried out multiple simulations of the model for many of the CMIP6 experiments. To create a single-model ensemble, the modelling centre will run a model using the same forcing data, but with small changes in the initial conditions. Due to the chaotic nature of the climate system, these small changes in initial conditions lead to differences in the modelled climate as time progresses. These differences are often referred to as internal variability. By running this single-model ensemble and comparing the results to simulations using different forcing datasets, it allows us to separate the internal variability from the externally-forced variability. If you are interested in learning more about large ensemble climate models, you can read this paper.

Let’s take advantage of this single-model ensemble to quantify the internal variability of this model’s simulated climate, and contrast this against the multi-model uncertainty we diagnosed in the previous tutorial.

Coding Exercise 1.1#

Complete the following code to:

Load 5 different realizations of the MPI-ESM1-2-LR experiments (r1i1p1f1 through r5i1p1f1). This numbering convention means they were each initialized using a different time-snapshot of the base/spin-up simulation.
Plot the historical and SSP1-2.6 experiment data for each realization, using a distinct color for each realization, but keeping that color the same across both the historical period and future period for a given realization.

col = intake.open_esm_datastore(
    "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
)  # open an intake catalog containing the Pangeo CMIP cloud data

cat_ensemble = col.search(
    source_id="MPI-ESM1-2-LR",
    variable_id="tos",
    table_id="Omon",
    # select the 5 ensemble members described above
    member_id=...,
    grid_label="gn",
    experiment_id=["historical", "ssp126", "ssp585"],
    require_all_on=["source_id", "member_id"],
)

# convert the sub-catalog into a datatree object, by opening each dataset into an xarray.Dataset (without loading the data)
kwargs = dict(
    preprocess=combined_preprocessing,  # apply xMIP fixes to each dataset
    xarray_open_kwargs=dict(
        use_cftime=True
    ),  # ensure all datasets use the same time index
    storage_options={
        "token": "anon"
    },  # anonymous/public authentication to google cloud storage
)
# hopefully we can implement https://github.com/intake/intake-esm/issues/562 before the
# actual tutorial, so this would be a lot cleaner
cat_ensemble.esmcat.aggregation_control.groupby_attrs = [
    "source_id", "experiment_id"]
dt_ensemble = cat_ensemble.to_datatree(**kwargs)

# add the area (we can reuse the area from before, since for a given model the horizontal are does not vary between members)
dt_ensemble_with_area = DataTree()
for model, subtree in dt_ensemble.items():
    metric = dt_area["MPI-ESM1-2-LR"]["historical"].ds["areacello"].squeeze()
    dt_ensemble_with_area[model] = subtree.map_over_subtree(
        _parse_metric, metric)

# global average
# average every dataset in the tree globally
dt_ensemble_gm = dt_ensemble_with_area.map_over_subtree(global_mean)

# calculate anomaly
dt_ensemble_gm_anomaly = datatree_anomaly(dt_ensemble_gm)


def plot_historical_ssp126_ensemble_combined(dt, ax):
    for model in dt.keys():
        datasets = []
        for experiment in ["historical", "ssp126"]:
            datasets.append(
                dt[model][experiment].ds.coarsen(time=12).mean().tos)

        # concatenate the historical and ssp126 timeseries for each ensemble member
        da_combined = ...
        # plot annual averages
        da_combined.plot(hue="member_id", ax=ax)


fig, ax = plt.subplots()
plot_historical_ssp126_ensemble_combined(dt_ensemble_gm_anomaly, ax)

ax.set_title(
    "Global Mean SST Anomaly in SSP1-2.6 from a 5-member single-model ensemble"
)
ax.set_ylabel("Global Mean SST Anomaly [$^\circ$C]")
ax.set_xlabel("Year")

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[7], line 1
----> 1 col = intake.open_esm_datastore(
   "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
)  # open an intake catalog containing the Pangeo CMIP cloud data
cat_ensemble = col.search(
   source_id="MPI-ESM1-2-LR",
   variable_id="tos",
   (...)
   require_all_on=["source_id", "member_id"],
)
# convert the sub-catalog into a datatree object, by opening each dataset into an xarray.Dataset (without loading the data)

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/intake_esm/core.py:107, in esm_datastore.__init__(self, obj, progressbar, sep, registry, read_csv_kwargs, columns_with_iterables, storage_options, **intake_kwargs)
   self.esmcat = ESMCatalogModel.from_dict(obj)
else:
--> 107     self.esmcat = ESMCatalogModel.load(
       obj, storage_options=self.storage_options, read_csv_kwargs=read_csv_kwargs
   )
self.derivedcat = registry or default_registry
self._entries = {}

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/intake_esm/cat.py:264, in ESMCatalogModel.load(cls, json_file, storage_options, read_csv_kwargs)
       csv_path = f'{os.path.dirname(_mapper.root)}/{cat.catalog_file}'
   cat.catalog_file = csv_path
--> 264     df = pd.read_csv(
       cat.catalog_file,
       storage_options=storage_options,
       **read_csv_kwargs,
   )
else:
   df = pd.DataFrame(cat.catalog_dict)

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pandas/io/parsers/readers.py:912, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
kwds_defaults = _refine_defaults_read(
   dialect,
   delimiter,
   (...)
   dtype_backend=dtype_backend,
)
kwds.update(kwds_defaults)
--> 912 return _read(filepath_or_buffer, kwds)

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pandas/io/parsers/readers.py:577, in _read(filepath_or_buffer, kwds)
_validate_names(kwds.get("names", None))
# Create the parser.
--> 577 parser = TextFileReader(filepath_or_buffer, **kwds)
if chunksize or iterator:
   return parser

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1407, in TextFileReader.__init__(self, f, engine, **kwds)
   self.options["has_index_names"] = kwds["has_index_names"]
self.handles: IOHandles | None = None
-> 1407 self._engine = self._make_engine(f, self.engine)

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1661, in TextFileReader._make_engine(self, f, engine)
   if "b" not in mode:
       mode += "b"
-> 1661 self.handles = get_handle(
   f,
   mode,
   encoding=self.options.get("encoding", None),
   compression=self.options.get("compression", None),
   memory_map=self.options.get("memory_map", False),
   is_text=is_text,
   errors=self.options.get("encoding_errors", "strict"),
   storage_options=self.options.get("storage_options", None),
)
assert self.handles is not None
f = self.handles.handle

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pandas/io/common.py:716, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
   codecs.lookup_error(errors)
# open URLs
--> 716 ioargs = _get_filepath_or_buffer(
   path_or_buf,
   encoding=encoding,
   compression=compression,
   mode=mode,
   storage_options=storage_options,
)
handle = ioargs.filepath_or_buffer
handles: list[BaseBuffer]

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pandas/io/common.py:373, in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
       if content_encoding == "gzip":
           # Override compression based on Content-Encoding header
           compression = {"method": "gzip"}
--> 373         reader = BytesIO(req.read())
   return IOArgs(
       filepath_or_buffer=reader,
       encoding=encoding,
   (...)
       mode=fsspec_mode,
   )
if is_fsspec_url(filepath_or_buffer):

File ~/miniconda3/envs/climatematch/lib/python3.10/http/client.py:482, in HTTPResponse.read(self, amt)
else:
   try:
--> 482         s = self._safe_read(self.length)
   except IncompleteRead:
       self._close_conn()

File ~/miniconda3/envs/climatematch/lib/python3.10/http/client.py:631, in HTTPResponse._safe_read(self, amt)
def _safe_read(self, amt):
   """Read the number of bytes requested.

   This function should be used when <amt> bytes "should" be present for
   reading. If the bytes are truly not available (due to EOF), then the
   IncompleteRead exception can be used to detect the problem.
   """
--> 631     data = self.fp.read(amt)
   if len(data) < amt:
       raise IncompleteRead(data, amt-len(data))

File ~/miniconda3/envs/climatematch/lib/python3.10/socket.py:705, in SocketIO.readinto(self, b)
while True:
   try:
--> 705         return self._sock.recv_into(b)
   except timeout:
       self._timeout_occurred = True

File ~/miniconda3/envs/climatematch/lib/python3.10/ssl.py:1274, in SSLSocket.recv_into(self, buffer, nbytes, flags)
   if flags != 0:
       raise ValueError(
         "non-zero flags not allowed in calls to recv_into() on %s" %
         self.__class__)
-> 1274     return self.read(nbytes, buffer)
else:
   return super().recv_into(buffer, nbytes, flags)

File ~/miniconda3/envs/climatematch/lib/python3.10/ssl.py:1130, in SSLSocket.read(self, len, buffer)
try:
   if buffer is not None:
-> 1130         return self._sslobj.read(len, buffer)
   else:
       return self._sslobj.read(len)

KeyboardInterrupt: 

Coding Exercise 1.2#

Complete the following code to:

Repeat the final figure of the last tutorial, except now display means and uncertainty bands of the single-model ensemble that you just loaded, rather than the multi-model ensemble analyzed in the previous tutorial.

fig, ax = plt.subplots()
for experiment, color in zip(["historical", "ssp126", "ssp585"], ["C0", "C1", "C2"]):
    da = (
        dt_ensemble_gm_anomaly["MPI-ESM1-2-LR"][experiment]
        .ds.tos.coarsen(time=12)
        .mean()
        .load()
    )
    # calculate the mean across ensemble members
    da.mean(...).plot(color=color, label=experiment, ax=ax)

    # shading representing spread between members
    x = da.time.data
    # diagnose the lower range of the likely bounds
    da_lower = ...
    # diagnose the upper range of the likely bounds
    da_upper = ...
    ax.fill_between(x, da_lower, da_upper, alpha=0.5, color=color)

ax.set_title(
    "Global Mean SST Anomaly in SSP1-2.6 from a 5-member single-model ensemble"
)
ax.set_ylabel("Global Mean SST Anomaly [$^\circ$C]")
ax.set_xlabel("Year")
ax.legend()

Question 1.2: Climate Connection#

How does this figure compare to the multi-model ensemble figure from the previous tutorial (included below)? Can you interpret differences using the science we have discussed today?

tutorial 4 solution

Summary#

In this tutorial, we explored the internal climate variability and its implications for climate modeling and prediction. We discussed the utility of single-model ensembles for isolating the effects of internal variability by contrasting simulations with identical physics, numerics, and discretization. We quantified the internal variability of MPI-ESM1-2-LR model’s simulated climate and compared it to the uncertainty introduced by multi-model ensembles. Through this tutorial, we better understand the boundaries of climate prediction and the different sources of uncertainty in CMIP6 models.

Resources#

This tutorial uses data from the simulations conducted as part of the CMIP6 multi-model ensemble.

For examples on how to access and analyze data, please visit the Pangeo Cloud CMIP6 Gallery

For more information on what CMIP is and how to access the data, please see this page.

For more information about large ensemble climate modelling see this paper.

Climatematch Academy: Computational Tools for Climate Science

Tutorial 5: Internal Climate Variability

Contents