Regional Precipitation Variability and Extreme Events#

Content creators: Laura Paccini, Raphael Rocha

Content reviewers: Marguerite Brown, Ohad Zivan, Jenna Pearson, Chi Zhang

Content editors: Zane Mitrevica, Natalie Steinemann, Douglas Rao, Jenna Pearson, Chi Zhang, Ohad Zivan

Production editors: Wesley Banfield, Jenna Pearson, Chi Zhang, Ohad Zivan

Our 2023 Sponsors: NASA TOPS and Google DeepMind

# @title Project Background

from ipywidgets import widgets
from IPython.display import YouTubeVideo
from IPython.display import IFrame
from IPython.display import display


class PlayVideo(IFrame):
    def __init__(self, id, source, page=1, width=400, height=300, **kwargs):
        self.id = id
        if source == "Bilibili":
            src = f"https://player.bilibili.com/player.html?bvid={id}&page={page}"
        elif source == "Osf":
            src = f"https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render"
        super(PlayVideo, self).__init__(src, width, height, **kwargs)


def display_videos(video_ids, W=400, H=300, fs=1):
    tab_contents = []
    for i, video_id in enumerate(video_ids):
        out = widgets.Output()
        with out:
            if video_ids[i][0] == "Youtube":
                video = YouTubeVideo(
                    id=video_ids[i][1], width=W, height=H, fs=fs, rel=0
                )
                print(f"Video available at https://youtube.com/watch?v={video.id}")
            else:
                video = PlayVideo(
                    id=video_ids[i][1],
                    source=video_ids[i][0],
                    width=W,
                    height=H,
                    fs=fs,
                    autoplay=False,
                )
                if video_ids[i][0] == "Bilibili":
                    print(
                        f"Video available at https://www.bilibili.com/video/{video.id}"
                    )
                elif video_ids[i][0] == "Osf":
                    print(f"Video available at https://osf.io/{video.id}")
            display(video)
        tab_contents.append(out)
    return tab_contents


video_ids = [('Youtube', '49XHRe61LI8'), ('Bilibili', 'BV1Au411L7fo')]
tab_contents = display_videos(video_ids, W=730, H=410)
tabs = widgets.Tab()
tabs.children = tab_contents
for i in range(len(tab_contents)):
    tabs.set_title(i, video_ids[i][0])
display(tabs)

# @title Tutorial slides
# @markdown These are the slides for the videos in all tutorials today
from IPython.display import IFrame
link_id = "a53b4"

In this project, you will explore rain gauge and satellite data from CHIRPS and NOAA Terrestrial Climate Data Records NDVI datasets to extract rain estimates and land surface reflectance, respectively. This data will enable identification of extreme events in your region of interest. Besides investigating the relationships between these variables, you are encouraged to study the impact of extreme events on changes in vegetation.

Project Template#

Project Template

Note: The dashed boxes are socio-economic questions.

Data Exploration Notebook#

Project Setup#

# imports

import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import pooch
import os
import tempfile
import pandas as pd
import s3fs
import boto3
import botocore
import datetime

# helper functions

def pooch_load(filelocation=None,filename=None,processor=None):
    shared_location='/home/jovyan/shared/Data/Projects/Precipitation' # this is different for each day
    user_temp_cache=tempfile.gettempdir()
    
    if os.path.exists(os.path.join(shared_location,filename)):
        file = os.path.join(shared_location,filename)
    else:
        file = pooch.retrieve(filelocation,known_hash=None,fname=os.path.join(user_temp_cache,filename),processor=processor)

    return file

CHIRPS Version 2.0 Global Daily 0.25°#

The Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) is a high-resolution precipitation dataset developed by the Climate Hazards Group at the University of California, Santa Barbara. It combines satellite-derived precipitation estimates with ground-based station data to provide gridded precipitation data at a quasi-global scale between 50°S-50°N.

Indices for Extreme Events#

The Expert Team on Climate Change Detection and Indices (ETCCDI) has defined various indices that focus on different aspects such as duration or intensity of extreme events. The following functions provide examples of how to compute indices for each category. You can modify these functions to suit your specific needs or create your own custom functions. Here are some tips you can use:

Most of the indices require daily data, so in order to select a specific season you can just use xarray to subset the data. Example:

daily_precip_DJF = data_chirps.sel(time=data_chirps['time.season']=='DJF')

A common threshold for a wet event is precipitation greater than or equal to 1mm/day, while a dry (or non-precipitating) event is defined as precipitation less than 1mm/day.
Some of the indices are based on percentiles. You can define a base period climatology to calculate percentile thresholds, such as the 5th, 10th, 90th, and 95th percentiles, to determine extreme events.

def calculate_sdii_index(data):
    """
    This function calculates the Simple Daily Intensity Index (SDII), which
    represents the average amount of precipitation on wet days (days with
    precipitation greater than or equal to 1mm) for each year in the input data.
    The input data should be a Dataset with time coordinates, and the function
    returns a Dataset with the SDII index values for each year in the data.
    ----------
    - data (xarray.Dataset): Input dataset containing daily precipitation data.
    - period (str, optional): Period for which to calculate the SDII index.

    Returns:
    -------
        - xarray.Dataset: Dataset containing the SDII index for the given period.
    """
    # calculate daily precipitation amount on wet days (PR >= 1mm)
    wet_days = data.where(data >= 1)

    # group by year and calculate the sum precipitation on wet days
    sum_wet_days_grouped = wet_days.groupby("time.year").sum(dim="time")

    # count number of wet days for each time step
    w = wet_days.groupby("time.year").count(dim="time")

    # divide by the number of wet days to get SDII index
    sdii = sum_wet_days_grouped / w

    return sdii

def calculate_cdd_index(data):
    """
    This function takes a daily precipitation dataset as input and calculates
    the Consecutive Dry Days (CDD) index, which represents the longest sequence
    of consecutive days with precipitation less than 1mm. The input data should
    be a DataArray with time coordinates, and the function returns a DataArray
    with the CDD values for each unique year in the input data.
    Parameters:
    ----------
      - data (xarray.DataArray): The input daily precipitation data should be
      a dataset (eg. for chirps_data the SataArray would be chirps_data.precip)
    Returns:
    -------
      - cdd (xarray.DataArray): The calculated CDD index

    """
    # create a boolean array for dry days (PR < 1mm)
    dry_days = data < 1
    # initialize CDD array
    cdd = np.zeros(len(data.groupby("time.year")))
    # get unique years as a list
    unique_years = list(data.groupby("time.year").groups.keys())
    # iterate for each day
    for i, year in enumerate(unique_years):
        consecutive_trues = []
        current_count = 0
        for day in dry_days.sel(time=data["time.year"] == year).values:
            if day:
                current_count += 1
            else:
                if current_count > 0:
                    consecutive_trues.append(current_count)
                    current_count = 0
        if current_count > 0:
            consecutive_trues.append(current_count)
        # print(consecutive_trues)
        # CDD is the largest number of consecutive days
        cdd[i] = np.max(consecutive_trues)
    # transform to dataset
    cdd_da = xr.DataArray(cdd, coords={"year": unique_years}, dims="year")
    return cdd_da

# code to retrieve and load the data

years=range(1981,2024) # the years you want. we want 1981 till 2023
file_paths=['https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_daily/netcdf/p25/chirps-v2.0.'+str(year)+'.days_p25.nc' for year in years] # the format of the files
filenames=['chirps-v2.0.'+str(year)+'.days_p25.nc' for year in years] # the format of the files

downloaded_files=[ pooch_load(fpath,fname) for (fpath,fname) in zip(file_paths,filenames)] # download all of the files

#### open data as xarray
chirps_data = xr.open_mfdataset(
    downloaded_files, combine="by_coords"
)  # open the files as one dataset

Downloading data from 'https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_daily/netcdf/p25/chirps-v2.0.1985.days_p25.nc' to file '/tmp/chirps-v2.0.1985.days_p25.nc'.

SHA256 hash of downloaded file: 795f6ef239d5a88b0afc682875ad5567f61d694980346c656db5445af428b3ab
Use this value as the 'known_hash' argument of 'pooch.retrieve' to ensure that the file hasn't changed if it is downloaded again in the future.

Downloading data from 'https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_daily/netcdf/p25/chirps-v2.0.1986.days_p25.nc' to file '/tmp/chirps-v2.0.1986.days_p25.nc'.

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[7], line 7
      4 file_paths=['https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_daily/netcdf/p25/chirps-v2.0.'+str(year)+'.days_p25.nc' for year in years] # the format of the files
      5 filenames=['chirps-v2.0.'+str(year)+'.days_p25.nc' for year in years] # the format of the files
----> 7 downloaded_files=[ pooch_load(fpath,fname) for (fpath,fname) in zip(file_paths,filenames)] # download all of the files
      9 #### open data as xarray
     10 chirps_data = xr.open_mfdataset(
     11     downloaded_files, combine="by_coords"
     12 )  # open the files as one dataset

Cell In[7], line 7, in <listcomp>(.0)
      4 file_paths=['https://data.chc.ucsb.edu/products/CHIRPS-2.0/global_daily/netcdf/p25/chirps-v2.0.'+str(year)+'.days_p25.nc' for year in years] # the format of the files
      5 filenames=['chirps-v2.0.'+str(year)+'.days_p25.nc' for year in years] # the format of the files
----> 7 downloaded_files=[ pooch_load(fpath,fname) for (fpath,fname) in zip(file_paths,filenames)] # download all of the files
      9 #### open data as xarray
     10 chirps_data = xr.open_mfdataset(
     11     downloaded_files, combine="by_coords"
     12 )  # open the files as one dataset

Cell In[4], line 10, in pooch_load(filelocation, filename, processor)
      8     file = os.path.join(shared_location,filename)
      9 else:
---> 10     file = pooch.retrieve(filelocation,known_hash=None,fname=os.path.join(user_temp_cache,filename),processor=processor)
     12 return file

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pooch/core.py:239, in retrieve(url, known_hash, fname, path, processor, downloader, progressbar)
    236 if downloader is None:
    237     downloader = choose_downloader(url, progressbar=progressbar)
--> 239 stream_download(url, full_path, known_hash, downloader, pooch=None)
    241 if known_hash is None:
    242     get_logger().info(
    243         "SHA256 hash of downloaded file: %s\n"
    244         "Use this value as the 'known_hash' argument of 'pooch.retrieve'"
   (...)
    247         file_hash(str(full_path)),
    248     )

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pooch/core.py:803, in stream_download(url, fname, known_hash, downloader, pooch, retry_if_failed)
    799 try:
    800     # Stream the file to a temporary so that we can safely check its
    801     # hash before overwriting the original.
    802     with temporary_file(path=str(fname.parent)) as tmp:
--> 803         downloader(url, tmp, pooch)
    804         hash_matches(tmp, known_hash, strict=True, source=str(fname.name))
    805         shutil.move(tmp, str(fname))

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/pooch/downloaders.py:226, in HTTPDownloader.__call__(self, url, output_file, pooch, check_only)
    224     progress = self.progressbar
    225     progress.total = total
--> 226 for chunk in content:
    227     if chunk:
    228         output_file.write(chunk)

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/requests/models.py:816, in Response.iter_content.<locals>.generate()
    814 if hasattr(self.raw, "stream"):
    815     try:
--> 816         yield from self.raw.stream(chunk_size, decode_content=True)
    817     except ProtocolError as e:
    818         raise ChunkedEncodingError(e)

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/urllib3/response.py:628, in HTTPResponse.stream(self, amt, decode_content)
    626 else:
    627     while not is_fp_closed(self._fp):
--> 628         data = self.read(amt=amt, decode_content=decode_content)
    630         if data:
    631             yield data

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/urllib3/response.py:567, in HTTPResponse.read(self, amt, decode_content, cache_content)
    564 fp_closed = getattr(self._fp, "closed", False)
    566 with self._error_catcher():
--> 567     data = self._fp_read(amt) if not fp_closed else b""
    568     if amt is None:
    569         flush_decoder = True

File ~/miniconda3/envs/climatematch/lib/python3.10/site-packages/urllib3/response.py:533, in HTTPResponse._fp_read(self, amt)
    530     return buffer.getvalue()
    531 else:
    532     # StringIO doesn't like amt=None
--> 533     return self._fp.read(amt) if amt is not None else self._fp.read()

File ~/miniconda3/envs/climatematch/lib/python3.10/http/client.py:466, in HTTPResponse.read(self, amt)
    463 if self.length is not None and amt > self.length:
    464     # clip the read to the "end of response"
    465     amt = self.length
--> 466 s = self.fp.read(amt)
    467 if not s and amt:
    468     # Ideally, we would raise IncompleteRead if the content-length
    469     # wasn't satisfied, but it might break compatibility.
    470     self._close_conn()

File ~/miniconda3/envs/climatematch/lib/python3.10/socket.py:705, in SocketIO.readinto(self, b)
    703 while True:
    704     try:
--> 705         return self._sock.recv_into(b)
    706     except timeout:
    707         self._timeout_occurred = True

File ~/miniconda3/envs/climatematch/lib/python3.10/ssl.py:1274, in SSLSocket.recv_into(self, buffer, nbytes, flags)
   1270     if flags != 0:
   1271         raise ValueError(
   1272           "non-zero flags not allowed in calls to recv_into() on %s" %
   1273           self.__class__)
-> 1274     return self.read(nbytes, buffer)
   1275 else:
   1276     return super().recv_into(buffer, nbytes, flags)

File ~/miniconda3/envs/climatematch/lib/python3.10/ssl.py:1130, in SSLSocket.read(self, len, buffer)
   1128 try:
   1129     if buffer is not None:
-> 1130         return self._sslobj.read(len, buffer)
   1131     else:
   1132         return self._sslobj.read(len)

KeyboardInterrupt: 

We can now visualize the content of the dataset.

# code to print the shape, array names, etc of the dataset
chirps_data

NOAA Fundamental Climate Data Records (FCDR) AVHRR Land Bundle - Surface Reflectance and Normalized Difference Vegetation Index#

As we learned in the W1D3 tutorials, all the National Atmospheric and Oceanic Administration Climate Data Record (NOAA-CDR) datasets are available both at NOAA National Centers for Environmental Information (NCEI) and commercial cloud platforms. See the link here. We are accessing the data directly via the Amazon Web Service (AWS) cloud storage space.

For this project we recommend using the Normalized Difference Vegetation Index (NDVI). It is one of the most commonly used remotely sensed indices. It measures the “greeness” of vegetation, and is useful in understanding vegetation density and assessing changes in plant health. For example, NDVI can be used to study the impacts of droughts, heatwaves, and insect infestations on plants covering Earth’s surface. A good overview of this index from this particular sensor can be accessed here.

Recall what we learned in W1D3 Tutorial 3, the data files on AWS are named systematically:

Sensor name: AVHRR
Product category: Land
Product version: v005
Product code: AVH13C1
Satellite platform: NOAA-07
Date of the data: 19810624
Processing time: c20170610041337 (This will change for each file based on when the file was processed)
File format: .nc (netCDR-4 format)

In other words, if we are looking for the data of a specific day, we can easily locate where the file might be.

For example, if we want to find the AVHRR data for the day of 2002-03-12 (or March 12, 2002), you can use:

s3://noaa-cdr-ndvi-pds/data/2002/AVHRR-Land_v005_AVH13C1_*_20020312_c*.nc

The reasaon that we put * in the above directory is because we are not sure about what satellite platform this data is from and when the data was processed. The * is called a wildcard, and is used because we want all the files that contain our specific criteria, but do not want to have to specify all the other pieces of the filename we are not sure about yet. It should return all the data satisfying that initial criteria and you can refine further once you see what is available. Essentially, this first step helps to narrow down the data search. You can then use this to create datasets from the timeframe of your choosing.

# we can use the data from W1D3 tutorial 3
# to access the NDVI data from AWS S3 bucket, we first need to connect to s3 bucket

fs = s3fs.S3FileSystem(anon=True)
client = boto3.client('s3', config=botocore.client.Config(signature_version=botocore.UNSIGNED)) # initialize aws s3 bucket client
date_sel = datetime.datetime(2001,1,1,0) 
file_location = fs.glob('s3://noaa-cdr-ndvi-pds/data/'+
                        date_sel.strftime('%Y')+'/AVHRR-Land_v005_AVH13C1_*_c*.nc') # the files we want for a whole year
files_for_pooch=[pooch_load('http://s3.amazonaws.com/'+file,file) for file in file_location]

ds = xr.open_mfdataset(files_for_pooch, combine="by_coords")  # open the file
ds

Alternative NDVI source: MODIS/Terra Vegetation Indices (MOD13C2) Version 6.1 L3 Global 0.05° CMG#

Global MODIS (Moderate Resolution Imaging Spectroradiometer) vegetation indices are designed to provide consistent spatial and temporal comparisons of vegetation conditions. Blue, red, and near-infrared reflectances, centered at 469-nanometers, 645-nanometers, and 858-nanometers, respectively, are used to determine the MODIS vegetation indices.

The MODIS Normalized Difference Vegetation Index (NDVI) complements NOAA’s Advanced Very High Resolution Radiometer (AVHRR) NDVI products providing continuity for time series applications over this rich historical archive.

Global MOD13C2 data are cloud-free spatial composites of the gridded 16-day 1-kilometer MOD13C2A2, and are provided as a level-3 product projected on a 0.05 degree (5600-meter) geographic Climate Modeling Grid (CMG). These files have also been pre-processed to match the grid of the precipitation data.

years=range(2000,2024) # the NDVI data we have available 
file_paths=['MODIS/NDVI_'+str(year)+'.nc' for year in years] # the format of the files
filenames=['MODIS/NDVI_'+str(year)+'.nc'  for year in years] # the format of the files

downloaded_files=[ pooch_load(fpath,fname) for (fpath,fname) in zip(file_paths,filenames)] # download all of the files
# load Data
modis_data = xr.open_mfdataset(downloaded_files,combine='by_coords')
modis_data

Worldbank Data: Cereal Production#

Cereal production is a crucial component of global agriculture and food security. The World Bank collects and provides data on cereal production, which includes crops such as wheat, rice, maize, barley, oats, rye, sorghum, millet, and mixed grains. The data covers various indicators such as production quantity, area harvested, yield, and production value.

The World Bank also collects data on land under cereals production, which refers to the area of land that is being used to grow cereal crops. This information can be valuable for assessing the productivity and efficiency of cereal production systems in different regions, as well as identifying potential areas for improvement. Overall, the World Bank’s data on cereal production and land under cereals production is an important resource for policymakers, researchers, and other stakeholders who are interested in understanding global trends in agriculture and food security.

# code to retrieve and load the data
filename_cereal = 'data_cereal_land.csv'
url_cereal = 'https://raw.githubusercontent.com/Sshamekh/Heatwave/f85f43997e3d6ae61e5d729bf77cfcc188fbf2fd/data_cereal_land.csv'
ds_cereal_land = pd.read_csv(pooch_load(url_cereal,filename_cereal))
ds_cereal_land.head() 

# example
ds_cereal_land[(ds_cereal_land["Country Name"] == "Brazil")].reset_index(
    drop=True
).iloc[0].transpose()

Now you are all set to address the questions you are interested in! Just be mindful of the specific coordinate names to avoid any issues.

You can use the provided functions as examples to compute various indices for extreme events based on duration or intensity. Don’t hesitate to modify them according to your specific needs or create your own custom functions.

Happy exploring and analyzing precipitation variability and extreme events in your project!

Climatematch Academy: Computational Tools for Climate Science

Regional Precipitation Variability and Extreme Events

Contents