Tutorial 2: A Lot of Weather Makes Climate - Exploring the ERA5 Reanalysis
Contents
Tutorial 2: A Lot of Weather Makes Climate - Exploring the ERA5 Reanalysis#
Week 1, Day 2, Ocean-Atmosphere Reanalysis
Content creators: Momme Hell
Content reviewers: Katrina Dobson, Danika Gupta, Maria Gonzalez, Will Gregory, Nahid Hasan, Sherry Mi, Beatriz Cosenza Muralles, Jenna Pearson, Chi Zhang, Ohad Zivan
Content editors: Jenna Pearson, Chi Zhang, Ohad Zivan
Production editors: Wesley Banfield, Jenna Pearson, Chi Zhang, Ohad Zivan
Our 2023 Sponsors: NASA TOPS and Google DeepMind
Tutorial Objectives#
In the previous tutorial, we learned about ENSO, which is a specific atmosphere-ocean dynamical phenomena. You will now examine the atmosphere and the ocean systems more generally.
In this tutorial, you will learn to work with reanalysis data. These data combine observations and models of the Earth system, and are a critical tool for weather and climate science. You will first utilize two methods to access a specific reanalysis dataset (ECMWF’s ERA5; through PO.DAAC and through the web Copernicus API). You will then select and mask a region of interest, investigating how important climate variables change on medium length timescales (hours to months) within this region.
By the end of this tutorial, you will be able to:
Access and select reanalysis data of cliamtically-important variables
Plot maps to explore changes on various time scales.
Compute and compare timeseries of different variables from reanalysis data.
Setup#
# installations ( uncomment and run this cell ONLY when using google colab or kaggle )
# !pip install pythia_datasets
# !pip install cartopy
# !pip install geoviews
# !pip install boto3 --quiet
# !pip install intake
# imports
from intake import open_catalog
import matplotlib.pyplot as plt
import matplotlib
import xarray as xr
import fsspec
import numpy as np
import boto3
import botocore
import datetime
import numpy as np
import os
import pooch
import tempfile
import geoviews as gv
import holoviews
from geoviews import Dataset as gvDataset
import geoviews.feature as gf
from geoviews import Image as gvImage
from cartopy import crs as ccrs
from cartopy import feature as cfeature
# import warnings
# # Suppress warnings issued by Cartopy when downloading data files
# warnings.filterwarnings('ignore')
Figure Settings#
# @title Figure Settings
import ipywidgets as widgets # interactive display
%config InlineBackend.figure_format = 'retina'
plt.style.use(
"https://raw.githubusercontent.com/ClimateMatchAcademy/course-content/main/cma.mplstyle"
)
Helper functions#
# @title Helper functions
def pooch_load(filelocation=None, filename=None, processor=None):
shared_location = "/home/jovyan/shared/Data/tutorials/W1D2_StateoftheClimateOceanandAtmosphereReanalysis" # this is different for each day
user_temp_cache = tempfile.gettempdir()
if os.path.exists(os.path.join(shared_location, filename)):
file = os.path.join(shared_location, filename)
else:
file = pooch.retrieve(
filelocation,
known_hash=None,
fname=os.path.join(user_temp_cache, filename),
processor=processor,
)
return file
Video 1: ECMWF Reanalysis#
Section 1: What is Reanalysis Data?#
Reanalysis refers to the process of combining historical observations from a variety of sources, such as weather stations, satellite measurments, and ocean buoys, with numerical models to create a comprehensive and consistent record of past weather and climate conditions. Reanalysis data is a useful tool to examine the Earth’s climate system over a wide range of time scales, from seasonal through decadal to century-scale changes.
There are multiple Earth system reanalysis products (e.g. MERRA-2, NCEP-NCAR, JRA-55C, see extensive list here), and no single product fits all needs. For the purposes of this tutorial you will be using a product from the European Centre for Medium-Range Weather Forecasts (ECMWF) called ECMWF Reanalysis v5 (ERA5). This video from the ECMWF provides you with a brief introduction to the ERA5 product.
Section 1.1: Accessing ERA5 Data#
You will access the data through the an AWS S3 bucket of the data: ECMWF ERA5 Reanalysis. To do this you need the name of the bucket “era5-pds”, and the file location in the bucket. Note: you can open the AWS link and find a guided tutorial on how to explore the S3 bucket.
Let’s select a specific year and month to work with, March of 2018:
era5_bucket = "era5-pds"
client = boto3.client(
"s3", config=botocore.client.Config(signature_version=botocore.UNSIGNED)
) # initialize aws s3 bucket client
date_sel = datetime.datetime(
2018, 3, 1, 0
) # select a desired date and hours (midnight is zero)
prefix = date_sel.strftime("%Y/%m/") # format the date to match the s3 bucket
metadata_file = "main.nc" # filename on the s3 bucket
metadata_key = prefix + metadata_file # file location and name on the s3 bucket
filepath = pooch_load(
filelocation="http://s3.amazonaws.com/" + era5_bucket + "/" + metadata_key,
filename=metadata_file,
) # open the file
ds_meta = xr.open_dataset(filepath) # open the file
ds_meta
<xarray.Dataset> Dimensions: ( lat: 721, time1: 744, time0: 744, lon: 1440, lon_ocean: 720, lat_ocean: 361, nv: 2) Coordinates: * lat (lat) float32 ... * time1 (time1) datetime64[ns] ... * time0 (time0) datetime64[ns] ... * lon (lon) float32 ... * lon_ocean (lon_ocean) float32 ... * lat_ocean (lat_ocean) float32 ... Dimensions without coordinates: nv Data variables: (12/19) sea_surface_temperature (time0, lat, lon) float32 ... sea_surface_wave_mean_period (time0, lat_ocean, lon_ocean) float32 ... air_temperature_at_2_metres_1hour_Maximum (time1, lat, lon) float32 ... dew_point_temperature_at_2_metres (time0, lat, lon) float32 ... significant_height_of_wind_and_swell_waves (time0, lat_ocean, lon_ocean) float32 ... sea_surface_wave_from_direction (time0, lat_ocean, lon_ocean) float32 ... ... ... lwe_thickness_of_surface_snow_amount (time0, lat, lon) float32 ... snow_density (time0, lat, lon) float32 ... eastward_wind_at_10_metres (time0, lat, lon) float32 ... integral_wrt_time_of_surface_direct_downwelling_shortwave_flux_in_air_1hour_Accumulation (time1, lat, lon) float32 ... air_temperature_at_2_metres (time0, lat, lon) float32 ... northward_wind_at_10_metres (time0, lat, lon) float32 ... Attributes: source: Reanalysis institution: ECMWF tilte: ERA5 forecasts
You just loaded an xarray
dataset, as introduced at the first day. This dataset contains 19 variables covering the whole globe (-90 to +90 degrees in latitude, 0 to 360 degrees on longitude) along with their respective coordinates. With this dataset you have access to our best estimates of climate parameters with a temporal resolution of 1 hour and a spatial resolution of 1/4 degree (i.e. grid points near the Equator represent a ~25 km x 25 km region). This is a lot of data, but still just a fraction the data available through the full ERA5 dataset.
Section 1.2: Selecting Regions of Interest#
The global ERA5 data over the entire time range is so large that even just one variable would be too large to store on your computer. Here you will apply a method to load a region (i.e., a spatial subset) of the data. In this first example, you will load air surface temperature at 2 meters data for a small region in the Northeastern United States. In later tutorials you will have the opportunity to select a region of your choice and to explore other climate variables.
# the order of the lat lon range has to follow the convention of the data set.
# for this dataset, longitude ranges from 0 to 360 degrees and
# latitude ranges from 90 degrees Northto 90 degrees South .
# northeastern United States
lat_range = [55.2, 30.2] # from north to south
lon_range = [270, 295] # from west to east
# note this can take several minutes to download
selected_vars = [
"air_temperature_at_2_metres",
"northward_wind_at_10_metres",
"eastward_wind_at_10_metres",
"surface_air_pressure",
"sea_surface_temperature",
] # the variables we want
s3_data_ptrn = (
"{year}/{month}/data/{var}.nc" # path and filename format for this S3 bucket
)
year_s3 = date_sel.strftime("%Y") # extract the year
month_s3 = date_sel.strftime("%m") # extract the month
ERA5_select = [] # initialize the dataset, will save a complicated check later
for var in selected_vars: # this will download 5 files of 500Mb each.
s3_data_key = s3_data_ptrn.format(
year=year_s3, month=month_s3, var=var
) # variable specific key
print("Downloading %s from S3..." % s3_data_key)
filepath = pooch_load(
filelocation="http://s3.amazonaws.com/" + era5_bucket + "/" + s3_data_key,
filename=s3_data_key,
) # open the file
ds_temp = xr.open_dataset(filepath) # retrieve the variable from the bucket
if (
ERA5_select
): # check if the dataset is empty or not (first iteration of the loop)
ERA5_select = xr.merge(
[ERA5_select, ds_temp]
) # if not empty, merge the new file
else:
ERA5_select = ds_temp # if empty, just assign the new file
ERA5_allvars = ERA5_select.sel(lon=slice(lon_range[0], lon_range[1])).sel(
lat=slice(lat_range[0], lat_range[1])
)
ERA5_allvars
Downloading data from 'http://s3.amazonaws.com/era5-pds/2018/03/data/air_temperature_at_2_metres.nc' to file '/tmp/2018/03/data/air_temperature_at_2_metres.nc'.
Downloading 2018/03/data/air_temperature_at_2_metres.nc from S3...
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[8], line 20
16 s3_data_key = s3_data_ptrn.format(
17 year=year_s3, month=month_s3, var=var
18 ) # variable specific key
19 print("Downloading %s from S3..." % s3_data_key)
---> 20 filepath = pooch_load(
21 filelocation="http://s3.amazonaws.com/" + era5_bucket + "/" + s3_data_key,
22 filename=s3_data_key,
23 ) # open the file
24 ds_temp = xr.open_dataset(filepath) # retrieve the variable from the bucket
25 if (
26 ERA5_select
27 ): # check if the dataset is empty or not (first iteration of the loop)
Cell In[4], line 10, in pooch_load(filelocation, filename, processor)
8 file = os.path.join(shared_location, filename)
9 else:
---> 10 file = pooch.retrieve(
11 filelocation,
12 known_hash=None,
13 fname=os.path.join(user_temp_cache, filename),
14 processor=processor,
15 )
17 return file
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pooch/core.py:239, in retrieve(url, known_hash, fname, path, processor, downloader, progressbar)
236 if downloader is None:
237 downloader = choose_downloader(url, progressbar=progressbar)
--> 239 stream_download(url, full_path, known_hash, downloader, pooch=None)
241 if known_hash is None:
242 get_logger().info(
243 "SHA256 hash of downloaded file: %s\n"
244 "Use this value as the 'known_hash' argument of 'pooch.retrieve'"
(...)
247 file_hash(str(full_path)),
248 )
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pooch/core.py:803, in stream_download(url, fname, known_hash, downloader, pooch, retry_if_failed)
799 try:
800 # Stream the file to a temporary so that we can safely check its
801 # hash before overwriting the original.
802 with temporary_file(path=str(fname.parent)) as tmp:
--> 803 downloader(url, tmp, pooch)
804 hash_matches(tmp, known_hash, strict=True, source=str(fname.name))
805 shutil.move(tmp, str(fname))
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pooch/downloaders.py:226, in HTTPDownloader.__call__(self, url, output_file, pooch, check_only)
224 progress = self.progressbar
225 progress.total = total
--> 226 for chunk in content:
227 if chunk:
228 output_file.write(chunk)
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/requests/models.py:816, in Response.iter_content.<locals>.generate()
814 if hasattr(self.raw, "stream"):
815 try:
--> 816 yield from self.raw.stream(chunk_size, decode_content=True)
817 except ProtocolError as e:
818 raise ChunkedEncodingError(e)
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/urllib3/response.py:628, in HTTPResponse.stream(self, amt, decode_content)
626 else:
627 while not is_fp_closed(self._fp):
--> 628 data = self.read(amt=amt, decode_content=decode_content)
630 if data:
631 yield data
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/urllib3/response.py:567, in HTTPResponse.read(self, amt, decode_content, cache_content)
564 fp_closed = getattr(self._fp, "closed", False)
566 with self._error_catcher():
--> 567 data = self._fp_read(amt) if not fp_closed else b""
568 if amt is None:
569 flush_decoder = True
File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/urllib3/response.py:533, in HTTPResponse._fp_read(self, amt)
530 return buffer.getvalue()
531 else:
532 # StringIO doesn't like amt=None
--> 533 return self._fp.read(amt) if amt is not None else self._fp.read()
File ~/miniconda3/envs/climatematch/lib/python3.10/http/client.py:466, in HTTPResponse.read(self, amt)
463 if self.length is not None and amt > self.length:
464 # clip the read to the "end of response"
465 amt = self.length
--> 466 s = self.fp.read(amt)
467 if not s and amt:
468 # Ideally, we would raise IncompleteRead if the content-length
469 # wasn't satisfied, but it might break compatibility.
470 self._close_conn()
File ~/miniconda3/envs/climatematch/lib/python3.10/socket.py:705, in SocketIO.readinto(self, b)
703 while True:
704 try:
--> 705 return self._sock.recv_into(b)
706 except timeout:
707 self._timeout_occurred = True
KeyboardInterrupt:
The magnitude of the wind vector represents the wind speed
which you will use later in the tutorial and discuss in more detail in tutorial 4. We will calculate that here and add it to our dataset.
# compute ten meter wind speed
ERA5_allvars["wind_speed"] = np.sqrt(
ERA5_allvars["northward_wind_at_10_metres"] ** 2
+ ERA5_allvars["eastward_wind_at_10_metres"] ** 2
) # calculate the wind speed from the vectors
# name and units in the DataArray:
ERA5_allvars["wind_speed"].attrs[
"long_name"
] = "10-meter wind speed" # assigning long name to the windspeed
ERA5_allvars["wind_speed"].attrs["units"] = "m/s" # assigning units
ERA5_allvars
Section 2: Plotting Spatial Maps of Reanalysis Data#
First, let’s plot the region’s surface temperature for the first time step of the reanalysis dataset. To do this let’s extract the air temperatre data from the dataset containing all the variables.
ds_surface_temp_2m = ERA5_allvars.air_temperature_at_2_metres
We will be plotting this a little bit differently that you have previously plotted a map (and differently to how you will plot in most tutorials) so we can look at a few times steps interactively later. To do this we are using the package geoviews.
holoviews.extension("bokeh")
dataset_plot = gvDataset(ds_surface_temp_2m.isel(time0=0)) # select the first time step
# create the image
images = dataset_plot.to(
gvImage, ["longitude", "latitude"], ["air_temperature_at_2_metres"], "hour"
)
# aesthetics, add coastlines etc.
images.opts(
cmap="coolwarm",
colorbar=True,
width=600,
height=400,
projection=ccrs.PlateCarree(),
clabel="2m Air Temperature [K]",
) * gf.coastline
In the above figure, coastlines are shown as black lines. Most of the selected region is land, with some ocean (lower left) and a lake (top middle).
Next, we will examine variability at two different frequencies using interactive plots:
Hourly variability
Daily variability
Note that in the previous tutorial you computed the monthly variability, or climatology, but here you only have one month of data loaded (March 2018). If you are curious about longer timescales you will visit this in the next tutorial!
# interactive plot of hourly frequency of surface temperature
# this cell may take a little longer as it contains several maps in a single plotting function
ds_surface_temp_2m_hour = ds_surface_temp_2m.groupby("time0.hour").mean()
dataset_plot = gvDataset(
ds_surface_temp_2m_hour.isel(hour=slice(0, 12))
) # only the first 12 time steps, as it is a time consuming task
images = dataset_plot.to(
gvImage, ["longitude", "latitude"], ["air_temperature_at_2_metres"], "hour"
)
images.opts(
cmap="coolwarm",
colorbar=True,
width=600,
height=400,
projection=ccrs.PlateCarree(),
clabel="2m Air Temperature [K]",
) * gf.coastline
# interactive plot of hourly frequency of surface temperature
# this cell may take a little longer as it contains several maps in a single plotting function holoviews.extension('bokeh')
ds_surface_temp_2m_day = ds_surface_temp_2m.groupby("time0.day").mean()
dataset_plot = gvDataset(
ds_surface_temp_2m_day.isel(day=slice(0, 10))
) # only the first 10 time steps, as it is a time consuming task
images = dataset_plot.to(
gvImage, ["longitude", "latitude"], ["air_temperature_at_2_metres"], "day"
)
images.opts(
cmap="coolwarm",
colorbar=True,
width=600,
height=400,
projection=ccrs.PlateCarree(),
clabel="Air Temperature [K]",
) * gf.coastline
Question 2#
What differences do you notice between the hourly and daily interactive plots, and are there any interesting spatial patterns of these temperature changes?
Section 3: Plotting Timeseries of Reanalysis Data#
Section 3.1: Surface Air Temperature Timeseries#
You have demonstrated that there are a lot of changes in surface temperature within a day and between days. It is crucial to understand this temporal variability in the data when performing climate analysis.
Rather than plotting interactive spatial maps for different timescales, in this last section you will create a timeseries of surface air temperature from the data you have already examined to look at variability on longer than daily timescales. Instead of taking the mean in time to create maps, you will now take the mean in space to create timeseries.
Note that the spatially-averaged data will now only have a time coordinate coordinate, making it a timeseries (ts).
# find weights (this is a regular grid so we can use cos(lat))
weights = np.cos(np.deg2rad(ds_surface_temp_2m.lat))
weights.name = "weights"
# take the weighted spatial mean since the latitude range of the region of interest is large
ds_surface_temp_2m_ts = ds_surface_temp_2m.weighted(weights).mean(["lon", "lat"])
ds_surface_temp_2m_ts
# plot the timeseries of surface temperature
fig, ax = plt.subplots(1, 1, figsize=(10, 3))
ax.plot(ds_surface_temp_2m_ts.time0, ds_surface_temp_2m_ts)
ax.set_ylabel("2m Air \nTemperature (K)")
ax.xaxis.set_tick_params(rotation=45)
Questions 3.1#
What is the dominant source of the high frequency (short timescale) variability?
What drives the lower frequency variability?
Would the ENSO variablity that you computed in the previous tutorial show up here? Why or why not?
Section 3.2: Comparing Timeseries of Multiple Variables#
Below you will calculate the timeseries of the surface air temperature which we just plotted, alongside timeseries of several other ERA5 variables for the same period and region: 10-meter wind speed, atmospheric surface pressure, and sea surface temperature.
ERA5_allvars_ts = ERA5_allvars.weighted(weights).mean(["lon", "lat"])
plot_vars = [
"air_temperature_at_2_metres",
"wind_speed",
"surface_air_pressure",
"sea_surface_temperature",
]
fig, ax_list = plt.subplots(len(plot_vars), 1, figsize=(10, 13), sharex=True)
for var, ax in zip(plot_vars, ax_list):
ax.plot(ERA5_allvars_ts.time0, ERA5_allvars_ts[var])
ax.set_ylabel(
ERA5_allvars[var].attrs["long_name"] + ": " + ERA5_allvars[var].attrs["units"],
fontsize=12,
)
ax.xaxis.set_tick_params(rotation=45)
Questions 3.2#
Which variable shows variability that is dominated by:
The diurnal cycle?
The synoptic [~5 day] scale?
A mix of these two timescales?
Longer timescales?
Bonus Section 1: Selecting a Different Spatial Region#
Define another spatial region, such as where you live, by selecting a longitude and latitude range of of your choosing. To find the longitude and latitude coordinates of your region, you can use Google Earth view, and read the position of your cursor in the lower right corner.
Bonus Section 1.1: Note About the Geographic Coordinate System and the Coordinates Used in This Dataset#
A point on Earth is described by latitude-longitude coordinates relative to the zero-meridian line going through Greenwich in London, UK (longitude = 0 degree) and the xero-latitude line along the equator (latitude = 0 degrees). Points east of Greenwich up to the dateline on the opposite side of the globe are referenced as 0 to +180 and points to the west of Greenwich are 0 to -180. -180 and +180 refer to the same longitude, the so-called dateline in the central pacific.
However, our data is referenced in a slightly different way where longitude runs from 0 to 360 rather than -180 to +180. Longitude increases as you move east of Greenwich, until you reach Greenwich again (0 or 360 degrees), rather than stopping at the dateline.
Summary#
In this tutorial, you learned how to access and process ERA5 reanalysis data. You are able to select specific regions within the reanalysis dataset and perform operations such as taking spatial and temporal averages.
You also looked at different climate variables to distinguish idenitfy the variability present at different timescales.