Tutorial 1: Creating Maps of CMIP6 Earth System Model (ESM) Projections
Contents
Tutorial 1: Creating Maps of CMIP6 Earth System Model (ESM) Projections#
Week 2, Day 1, Future Climate: The Physical Basis
Content creators: Brodie Pearson, Julius Busecke, Tom Nicholas
Content reviewers: Younkap Nina Duplex, Zahra Khodakaramimaghsoud, Sloane Garelick, Peter Ohue, Jenna Pearson, Agustina Pesce, Derick Temfack, Peizhen Yang, Cheng Zhang, Chi Zhang, Ohad Zivan
Content editors: Jenna Pearson, Ohad Zivan, Chi Zhang
Production editors: Wesley Banfield, Jenna Pearson, Chi Zhang, Ohad Zivan
Our 2023 Sponsors: NASA TOPS, Google DeepMind, and CMIP
Tutorial Objectives#
Earth System Models (ESMs) provide physically-based projections of how Earth’s climate could change in the coming years, decades, and centuries at both global and local scales. In the following tutorial you will:
Revisit how to load ESM data from the CMIP6 experiments, and
Create maps showing projected future changes in sea surface temperature (SST).
Setup#
# installations ( uncomment and run this cell ONLY when using google colab or kaggle )
# !pip install condacolab &> /dev/null
# import condacolab
# condacolab.install()
# # Install all packages in one call (+ use mamba instead of conda), this must in one line or code will fail
# !mamba install xarray-datatree intake-esm gcsfs xmip aiohttp cartopy nc-time-axis cf_xarray xarrayutils "esmf<=8.3.1" xesmf &> /dev/null
# # For xesmf install we need to pin "esmf<=8.3.1". More context here: https://github.com/pangeo-data/xESMF/issues/246
# imports
import time
tic = time.time()
import intake
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import xesmf as xe
from xmip.preprocessing import combined_preprocessing
from xarrayutils.plotting import shaded_line_plot
from datatree import DataTree
from xmip.postprocessing import _parse_metric
import cartopy.crs as ccrs
# @title Figure settings
import ipywidgets as widgets # interactive display
plt.style.use(
"https://raw.githubusercontent.com/ClimateMatchAcademy/course-content/main/cma.mplstyle"
)
%matplotlib inline
# @title Video 1: Recap of Earth System Models
from ipywidgets import widgets
from IPython.display import YouTubeVideo
from IPython.display import IFrame
from IPython.display import display
class PlayVideo(IFrame):
def __init__(self, id, source, page=1, width=400, height=300, **kwargs):
self.id = id
if source == 'Bilibili':
src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'
elif source == 'Osf':
src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'
super(PlayVideo, self).__init__(src, width, height, **kwargs)
def display_videos(video_ids, W=400, H=300, fs=1):
tab_contents = []
for i, video_id in enumerate(video_ids):
out = widgets.Output()
with out:
if video_ids[i][0] == 'Youtube':
video = YouTubeVideo(id=video_ids[i][1], width=W,
height=H, fs=fs, rel=0)
print(f'Video available at https://youtube.com/watch?v={video.id}')
else:
video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,
height=H, fs=fs, autoplay=False)
if video_ids[i][0] == 'Bilibili':
print(f'Video available at https://www.bilibili.com/video/{video.id}')
elif video_ids[i][0] == 'Osf':
print(f'Video available at https://osf.io/{video.id}')
display(video)
tab_contents.append(out)
return tab_contents
video_ids = [('Youtube', '119vYX7t2Q4'), ('Bilibili', 'BV1Nz4y1J7hp')]
tab_contents = display_videos(video_ids, W=730, H=410)
tabs = widgets.Tab()
tabs.children = tab_contents
for i in range(len(tab_contents)):
tabs.set_title(i, video_ids[i][0])
display(tabs)
Section 1.1: Loading CMIP6 SST Data with xarray
#
As a reminder, these ESMs simulate several systems (ocean, atmosphere, cryosphere, land) that are coupled to each other, and each system has its own variables, physics, and discretizations (grid & timestep).
Atmospheric Model Schematic (Credit: Wikipedia)
Let’s repeat the CMIP6 loading method that we learned in Tutorial 6 on last week’s Climate Modelling day (Day 5).
Although we will only work with monthly SST (ocean) data today, the methods introduced can easily be applied/extended to load and analyze other CMIP6 variables, including from other components of the Earth system.
To learn more about CMIP, including additional methods to access CMIP data, please see our CMIP Resource Bank and the CMIP website.
As a reminder, the facets that have to be specified for CMIP6, along with the facet choice(s) we make in this tutorial, are:
variable_id: tos = sea surface temperature
source_id: The CMIP6 model(s) that we want data from
table_id: Omon (ocean monthly output)
grid_id: gn = data on the model’s native grid
experiment_id: ssp585 (we’ll discuss experiments later today)
member_id: r1i1p1f1 for now
Now, let’s repeat our CMIP6 loading method from the previous tutorial.
Note: today we will start by using only use one model, TaiESM1, which stands for Taiwan Earth System Model version 1, and a single experiment, ssp585 which is a high-emissions future scenario. In later tutorials you will work with 5 distinct CMIP6 models (source_id), and two additional experiments (experiment_id). TaiESM1 was developed by modifying the Community Earth System Model (CESM) version 1.2.2 to include different parameterizations (i.e., physics). As a result, the TaiESM1 model output is distinct from the CESM output you used in previous tutorials/days.
# open an intake catalog containing the Pangeo CMIP cloud data
col = intake.open_esm_datastore(
"https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
)
# from the full `col` object, create a subset using facet search
cat = col.search(
source_id="TaiESM1",
variable_id="tos",
member_id="r1i1p1f1",
table_id="Omon",
grid_label="gn",
experiment_id="ssp585",
require_all_on=[
"source_id"
], # make sure that we only get models which have all of the above experiments
)
# convert the sub-catalog into a datatree object, by opening each dataset into an xarray.Dataset (without loading the data)
kwargs = dict(
preprocess=combined_preprocessing, # apply xMIP fixes to each dataset
xarray_open_kwargs=dict(
use_cftime=True
), # ensure all datasets use the same time index
storage_options={
"token": "anon"
}, # anonymous/public authentication to google cloud storage
)
cat.esmcat.aggregation_control.groupby_attrs = ["source_id", "experiment_id"]
dt = cat.to_datatree(**kwargs)
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[6], line 2
1 # open an intake catalog containing the Pangeo CMIP cloud data
----> 2 col = intake.open_esm_datastore(
3 "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
4 )
6 # from the full `col` object, create a subset using facet search
7 cat = col.search(
8 source_id="TaiESM1",
9 variable_id="tos",
(...)
16 ], # make sure that we only get models which have all of the above experiments
17 )
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/intake_esm/core.py:107, in esm_datastore.__init__(self, obj, progressbar, sep, registry, read_csv_kwargs, columns_with_iterables, storage_options, **intake_kwargs)
105 self.esmcat = ESMCatalogModel.from_dict(obj)
106 else:
--> 107 self.esmcat = ESMCatalogModel.load(
108 obj, storage_options=self.storage_options, read_csv_kwargs=read_csv_kwargs
109 )
111 self.derivedcat = registry or default_registry
112 self._entries = {}
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/intake_esm/cat.py:264, in ESMCatalogModel.load(cls, json_file, storage_options, read_csv_kwargs)
262 csv_path = f'{os.path.dirname(_mapper.root)}/{cat.catalog_file}'
263 cat.catalog_file = csv_path
--> 264 df = pd.read_csv(
265 cat.catalog_file,
266 storage_options=storage_options,
267 **read_csv_kwargs,
268 )
269 else:
270 df = pd.DataFrame(cat.catalog_dict)
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pandas/io/parsers/readers.py:912, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
899 kwds_defaults = _refine_defaults_read(
900 dialect,
901 delimiter,
(...)
908 dtype_backend=dtype_backend,
909 )
910 kwds.update(kwds_defaults)
--> 912 return _read(filepath_or_buffer, kwds)
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pandas/io/parsers/readers.py:577, in _read(filepath_or_buffer, kwds)
574 _validate_names(kwds.get("names", None))
576 # Create the parser.
--> 577 parser = TextFileReader(filepath_or_buffer, **kwds)
579 if chunksize or iterator:
580 return parser
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1407, in TextFileReader.__init__(self, f, engine, **kwds)
1404 self.options["has_index_names"] = kwds["has_index_names"]
1406 self.handles: IOHandles | None = None
-> 1407 self._engine = self._make_engine(f, self.engine)
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1661, in TextFileReader._make_engine(self, f, engine)
1659 if "b" not in mode:
1660 mode += "b"
-> 1661 self.handles = get_handle(
1662 f,
1663 mode,
1664 encoding=self.options.get("encoding", None),
1665 compression=self.options.get("compression", None),
1666 memory_map=self.options.get("memory_map", False),
1667 is_text=is_text,
1668 errors=self.options.get("encoding_errors", "strict"),
1669 storage_options=self.options.get("storage_options", None),
1670 )
1671 assert self.handles is not None
1672 f = self.handles.handle
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pandas/io/common.py:716, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
713 codecs.lookup_error(errors)
715 # open URLs
--> 716 ioargs = _get_filepath_or_buffer(
717 path_or_buf,
718 encoding=encoding,
719 compression=compression,
720 mode=mode,
721 storage_options=storage_options,
722 )
724 handle = ioargs.filepath_or_buffer
725 handles: list[BaseBuffer]
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pandas/io/common.py:373, in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
370 if content_encoding == "gzip":
371 # Override compression based on Content-Encoding header
372 compression = {"method": "gzip"}
--> 373 reader = BytesIO(req.read())
374 return IOArgs(
375 filepath_or_buffer=reader,
376 encoding=encoding,
(...)
379 mode=fsspec_mode,
380 )
382 if is_fsspec_url(filepath_or_buffer):
File ~/miniconda3/envs/climatematch/lib/python3.10/http/client.py:482, in HTTPResponse.read(self, amt)
480 else:
481 try:
--> 482 s = self._safe_read(self.length)
483 except IncompleteRead:
484 self._close_conn()
File ~/miniconda3/envs/climatematch/lib/python3.10/http/client.py:631, in HTTPResponse._safe_read(self, amt)
624 def _safe_read(self, amt):
625 """Read the number of bytes requested.
626
627 This function should be used when <amt> bytes "should" be present for
628 reading. If the bytes are truly not available (due to EOF), then the
629 IncompleteRead exception can be used to detect the problem.
630 """
--> 631 data = self.fp.read(amt)
632 if len(data) < amt:
633 raise IncompleteRead(data, amt-len(data))
File ~/miniconda3/envs/climatematch/lib/python3.10/socket.py:705, in SocketIO.readinto(self, b)
703 while True:
704 try:
--> 705 return self._sock.recv_into(b)
706 except timeout:
707 self._timeout_occurred = True
File ~/miniconda3/envs/climatematch/lib/python3.10/ssl.py:1274, in SSLSocket.recv_into(self, buffer, nbytes, flags)
1270 if flags != 0:
1271 raise ValueError(
1272 "non-zero flags not allowed in calls to recv_into() on %s" %
1273 self.__class__)
-> 1274 return self.read(nbytes, buffer)
1275 else:
1276 return super().recv_into(buffer, nbytes, flags)
File ~/miniconda3/envs/climatematch/lib/python3.10/ssl.py:1130, in SSLSocket.read(self, len, buffer)
1128 try:
1129 if buffer is not None:
-> 1130 return self._sslobj.read(len, buffer)
1131 else:
1132 return self._sslobj.read(len)
KeyboardInterrupt:
We now have a “datatree” containing the data we searched for. A datatree is a high-level container of xarray data, useful for organizing many related datasets together. You can think of a single DataTree
object as being like a (nested) dictionary of xarray.Dataset
objects. Each dataset in the tree is known as a “node” or “group”, and we can also have empty nodes. This DataTree
object may seem complicated when we load only one dataset, but it will prove to be very useful in later tutorials where you will work with multiple models, experiments, and ensemble members
You can explore the nodes of the tree and its contents interactively in a similar way to how you can explore the contents of an xarray.Dataset
:
dt
Now that we have the model datasets organized within thie datatree (dt
) we can plot the datasets. Let’s start by plotting a map of SST from the TaiESM1
CMIP6 model in July 2023.
Note that CMIP6 experiments were run several years ago, so the cut-off between past (observed forcing) and future (scenario-based/projected forcing) was at the start of 2015. This means that July 2023 is about 8 years into the CMIP6 future and so it is unlikely to look exactly like Earth’s current SST state.
# select just a single model (TaiESM1) and experiment (ssp585) to plot
sst_ssp585 = dt["TaiESM1"]["ssp585"].ds.tos
fig, (ax_present) = plt.subplots(
ncols=1, nrows=1, figsize=[8, 4], subplot_kw={"projection": ccrs.Robinson()}
)
# select the model data for July 2023
sst_present = sst_ssp585.sel(time="2023-07").squeeze()
# plot the model data
sst_present.plot(
ax=ax_present,
x="lon",
y="lat",
transform=ccrs.PlateCarree(),
vmin=-10,
vmax=30,
cmap="magma",
robust=True,
)
ax_present.coastlines()
ax_present.set_title("July 2023")
Coding Exercise 1.1#
Now that we can plot maps of CMIP6 data, let’s look at some projected future changes using this data!
In this coding exercise your goals are to:
Create a map of the projected sea surface temperature in July 2100 under the SSP5-8.5 high-emissions scenario (we’ll discuss scenarios in the next mini-lecture) using data from the TaiESM1 CMIP6 model.
Create a map showing how this sea surface temperature projection is different from the current (July 2023) sea surface temperature in this model
Plot a similar map for this model that shows how January 2100 is different from January 2023
To get you started, we have provided code to load the required data set into a variable called sst_ssp585, and we have plotted the current (July 2023) sea surface temperature from this data set.
Note: differences between two snapshots of SST are not the same as the anomalies that you encountered earlier in the course, which were the difference relative to the average during a reference period.
# select just a single model and experiment
sst_ssp585 = dt["TaiESM1"]["ssp585"].ds.tos
fig, ([ax_present, ax_future], [ax_diff_july, ax_diff_jan]) = plt.subplots(
ncols=2, nrows=2, figsize=[12, 6], subplot_kw={"projection": ccrs.Robinson()}
)
# plot a timestep for 2023
sst_present = sst_ssp585.sel(time="2023-07").squeeze()
sst_present.plot(
ax=ax_present,
x="lon",
y="lat",
transform=ccrs.PlateCarree(),
vmin=-10,
vmax=30,
cmap="magma",
robust=True,
)
ax_present.coastlines()
ax_present.set_title("July 2023")
# repeat for 2100
# complete the following line to extract data for July 2100
sst_future = ...
_ = ...
ax_future.coastlines()
ax_future.set_title("July 2100")
# now find the difference between July 2100 and July 2023
# complete the following line to extract the July difference
sst_difference_july = ...
_ = ...
ax_diff_july.coastlines()
ax_diff_july.set_title("2100 vs. 2023 Difference (July)")
# finally, find the difference between January of the two years used above
# complete the following line to extract the January difference
sst_difference_jan = ...
_ = ...
ax_diff_jan.coastlines()
ax_diff_jan.set_title("2100 vs. 2023 Difference (January)")
Questions 1.1: Climate Connection#
Comparing only the top two panels, how is the July SST projected to change in this particular model simulation? Do these changes agree with the map of July change that you plotted in the bottom left, and are these changes easier to see in this bottom map?
In what ways are the July and January maps similar or dissimilar, and can you think of any physical explanations for these (dis)similarities?
Why do you think the color bar axes vary? (i.e., the top panels say “Sea Surface Temperature [\(^oC\)]” while the bottom panels say “tos”)
Many of the changes seen in the maps are a result of a changing climate under this high-emissions scenarios. However, keep in mind that these are differences between two months that are almost 80 years apart, so some of the changes are due to weather/synoptic differences between these particular months.
Section 1.2: Horizontal Regridding#
Many CMIP6 models use distinct spatial grids, we call this the model’s native grid.
You are likely familiar with the regular latitude-longitude grid where we separate the planet into boxes that have a fixed latitude and longitude span like this image we saw in the tutorial:
Let’s look at the grid used for the ocean component of the TaiESM1 CMIP6 model:
# create a scatter plot with a symbol at the center of each ocean grid cell in TaiESM1
fig, ax = plt.subplots()
ax.scatter(x=sst_ssp585.lon, y=sst_ssp585.lat, s=0.1)
ax.set_ylabel("Latitude")
ax.set_xlabel("Longitude")
ax.set_title("Grid cell locations in TaiESM1")
Questions 1.2#
How would this plot look for a regular latitude-longitude grid like the globe image shown above and in the slides? In what ways is the TaiESM1 grid different from this regular grid?
Can you think of a reason the Northern and Southern hemisphere ocean grids differ?*
*Hint: from an oceanographic context, how are the North and South poles different from each other?
If you want to compare spatial maps from different models/observations, e.g. plot a map averaged over several models or the bias of this map relative to observations, you must first ensure the data from all the models (and observations) is on the same spatial grid. This is where regridding becomes essential!
Regridding is applied lazily, but it is still taking time to compute when it is applied. So if you want to compare for example the mean over time of several models it is often much quicker to compute the mean in time over the native grid and then regrid the result of that, instead of regridding each timestep and then calculating the mean!
# define a 'target' grid. This is simply a regular lon/lat grid that we will interpolate our data on
ds_target = xr.Dataset(
{
"lat": (["lat"], np.arange(-90, 90, 1.0), {"units": "degrees_north"}),
"lon": (["lon"], np.arange(0, 360, 1.0), {"units": "degrees_east"}),
}
) # you can try to modify the parameters above to e.g. just regrid onto a region or make the resolution coarser etc
ds_target
# define the regridder object (from our source dataarray to the target)
regridder = xe.Regridder(
sst_ssp585, ds_target, "bilinear", periodic=True
) # this takes some time to calculate a weight matrix for the regridding
regridder
# now we can apply the regridder to our data
sst_ssp585_regridded = regridder(sst_ssp585) # this is a lazy operation!
# so it does not slow us down significantly to apply it to the full data!
# we can work with this array just like before and the regridding will only be
# applied to the parts that we later load into memory or plot.
sst_ssp585_regridded
# compare the shape to the original array
sst_ssp585
Section 1.3: Visually Comparing Data with Different Map Projections#
Let’s use the code from above to plot a map of the model data on its original (native) grid, and a map of the model data after it is regridded.
fig, ([ax_regridded, ax_native]) = plt.subplots(
ncols=2, figsize=[12, 3], subplot_kw={"projection": ccrs.Robinson()}
)
# Native grid data
sst_future = sst_ssp585.sel(time="2100-07").squeeze()
sst_future.plot(
ax=ax_native,
x="lon",
y="lat",
transform=ccrs.PlateCarree(),
vmin=-10,
vmax=30,
cmap="magma",
robust=True,
)
ax_native.coastlines()
ax_native.set_title("July 2100 Native Grid")
# Regridded data
sst_future_regridded = sst_ssp585_regridded.sel(time="2100-07").squeeze()
sst_future_regridded.plot(
ax=ax_regridded,
x="lon",
y="lat",
transform=ccrs.PlateCarree(),
vmin=-10,
vmax=30,
cmap="magma",
robust=True,
)
ax_regridded.coastlines()
ax_regridded.set_title("July 2100 Regridded")
Questions 1.3#
Is this what you expected to see after regridding the data?
Summary#
In Tutorial 1 you have:
Loaded and manipulated data from a CMIP6 model under a high-emissions future scenario experiment
Created maps of future projected changes in the Earth system using CMIP6 data
Converted/regridded CMIP6 model data onto a desired grid. This is a critical processing step that allows us to directly compare data from different models and/or observations
Resources#
This tutorial uses data from the simulations conducted as part of the CMIP6 multi-model ensemble.
For examples on how to access and analyze data, please visit the Pangeo Cloud CMIP6 Gallery
For more information on what CMIP is and how to access the data, please see this page.