{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ClimateMatchAcademy/course-content/blob/main/tutorials/W2D4_ClimateResponse-Extremes&Variability/W2D4_Tutorial7.ipynb)   \"Open" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# **Tutorial 7: Non-Stationarity in the EVT-framework**\n", "\n", "**Week 2, Day 4, Extremes & Vulnerability**\n", "\n", "**Content creators:** Matthias Aengenheyster, Joeri Reinders\n", "\n", "**Content reviewers:** Younkap Nina Duplex, Sloane Garelick, Zahra Khodakaramimaghsoud, Peter Ohue, Laura Paccini, Jenna Pearson, Agustina Pesce, Derick Temfack, Peizhen Yang, Cheng Zhang, Chi Zhang, Ohad Zivan\n", "\n", "**Content editors:** Jenna Pearson, Chi Zhang, Ohad Zivan\n", "\n", "**Production editors:** Wesley Banfield, Jenna Pearson, Chi Zhang, Ohad Zivan\n", "\n", "**Our 2023 Sponsors:** NASA TOPS and Google DeepMind" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# **Tutorial Objectives**" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "In this tutorial, we will enhance our GEV model by incorporating non-stationarity to achieve a better fit for the sea surface height data from Washington DC we analyzed previously.\n", "\n", "By the end of the tutorial you will be able to:\n", "- Apply a GEV distribution to a non-stationary record by including a time-dependent parameter.\n", "- Compare and analyze the fits of various models with different time-dependent parameters (e.g., location, scale, or shape).\n", "- Calculate effective return levels using our non-stationary GEV model.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# **Setup**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# !pip install -q condacolab\n", "# import condacolab\n", "# condacolab.install()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# #install dependencies - taken from comments for Tutorial 6\n", "# # We need this to install eigen which is needed for SDFC to install correctly\n", "# !mamba install eigen numpy matplotlib seaborn pandas cartopy scipy texttable intake xarrayutils xmip cf_xarray intake-esm\n", "# !pip install -v https://github.com/yrobink/SDFC/archive/master.zip#subdirectory=python\n", "# !pip install https://github.com/njleach/mystatsfunctions/archive/master.zip" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# imports\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import pandas as pd\n", "from scipy import stats\n", "from scipy.stats import genextreme as gev\n", "\n", "import gev_functions as gf\n", "import extremes_functions as ef\n", "import SDFC as sd\n", "import pooch\n", "import os\n", "import tempfile" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Note that `import gev_functions as gf` imports the functions introduced in Tutorial 6." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Helper functions\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Helper functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Helper functions\n", "def estimate_return_level(quantile, model):\n", " loc, scale, shape = model.loc, model.scale, model.shape\n", " level = loc - scale / shape * (1 - (-np.log(quantile)) ** (-shape))\n", " return level" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Figure Settings\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Figure Settings\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Figure Settings\n", "import ipywidgets as widgets # interactive display\n", "\n", "%config InlineBackend.figure_format = 'retina'\n", "plt.style.use(\n", " \"https://raw.githubusercontent.com/ClimateMatchAcademy/course-content/main/cma.mplstyle\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 1: Speaker Introduction\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Video 1: Speaker Introduction\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Video 1: Speaker Introduction\n", "# Tech team will add code to format and display the video" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# helper functions\n", "\n", "\n", "def pooch_load(filelocation=None, filename=None, processor=None):\n", " shared_location = \"/home/jovyan/shared/Data/tutorials/W2D4_ClimateResponse-Extremes&Variability\" # this is different for each day\n", " user_temp_cache = tempfile.gettempdir()\n", "\n", " if os.path.exists(os.path.join(shared_location, filename)):\n", " file = os.path.join(shared_location, filename)\n", " else:\n", " file = pooch.retrieve(\n", " filelocation,\n", " known_hash=None,\n", " fname=os.path.join(user_temp_cache, filename),\n", " processor=processor,\n", " )\n", "\n", " return file" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# **Section 1: Download and Explore the Data**\n", "As before, we start by visually inspecting the data. Create a plot of the recorded data over time." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# fname = 'WashingtonDCSSH1930-2022.csv'\n", "filename_WashingtonDCSSH1 = \"WashingtonDCSSH1930-2022.csv\"\n", "url_WashingtonDCSSH1 = \"https://osf.io/4zynp/download\"\n", "data = pd.read_csv(\n", " pooch_load(url_WashingtonDCSSH1, filename_WashingtonDCSSH1), index_col=0\n", ").set_index(\"years\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "data.ssh.plot(\n", " linestyle=\"-\",\n", " marker=\".\",\n", " xlabel=\"Annual Maximum Sea Surface \\nHeight Anomaly (mm)\",\n", " ylabel=\"Year\",\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Also, fit a stationary GEV distribution to the data and print the resulting fit:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "shape, loc, scale = gev.fit(data.ssh.values, 0)\n", "fit = gf.fit_return_levels(data.ssh.values, years=np.arange(1.1, 1000), N_boot=100)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "print(\"Location: %.2f, Scale: %.2f, Shape: %.2f\" % (loc, scale, shape))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "The figure presented here integrates multiple elements, including the QQ-plot (as discussed in previous tutorials), the comparison between modeled and empirical probability density functions, and the fitted return level plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "fig, axs = plt.subplots(2, 2, constrained_layout=True)\n", "ax = axs.flatten()\n", "\n", "x = np.linspace(0, 1, 100)\n", "ax[0].plot(gev.ppf(x, shape, loc=loc, scale=scale), np.quantile(data.ssh, x), \"o\")\n", "xlim = ax[0].get_xlim()\n", "ylim = ax[0].get_ylim()\n", "ax[0].plot(\n", " [min(xlim[0], ylim[0]), max(xlim[1], ylim[1])],\n", " [min(xlim[0], ylim[0]), max(xlim[1], ylim[1])],\n", " \"k\",\n", ")\n", "\n", "ax[0].set_xlim(xlim)\n", "ax[0].set_ylim(ylim)\n", "\n", "\n", "x = np.linspace(data.ssh.min() - 200, data.ssh.max() + 200, 1000)\n", "ax[2].plot(x, gev.pdf(x, shape, loc=loc, scale=scale), label=\"Modeled\")\n", "sns.kdeplot(data.ssh, ax=ax[2], label=\"Empirical\")\n", "ax[2].legend()\n", "\n", "gf.plot_return_levels(fit, ax=ax[3])\n", "ax[3].set_xlim(1.5, 1000)\n", "ax[3].set_ylim(0, None)\n", "\n", "ax[0].set_title(\"QQ-plot\")\n", "ax[2].set_title(\"PDF\")\n", "ax[3].set_title(\"Return levels\")\n", "\n", "ax[1].remove()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Use the function `estimate_return_level` to compute the 100-year return level:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "print(\n", " \"100-year return level: %.2f\"\n", " % gf.estimate_return_level_period(100, loc, scale, shape).mean()\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "\n", "We have the option to make our GEV (Generalized Extreme Value) model non-stationary by introducing a time component to the GEV parameters. This means that the parameters will vary with time or can be dependent on other variables such as global air temperature. We call this **adding covariates** to parameters. \n", "\n", "The simplest way to implement a non-stationary GEV model is by adding a linear time component to the location parameter. Instead of a fixed location parameter like \"100,\" it becomes a linear function of time (e.g., time * 1.05 + 80). This implies that the location parameter increases with time. We can incorporate this into our fitting process. The scipy GEV implementation does not support covariates. Therefore for this part we will use the package SDFC, using the `c_loc` option - to add a covariate ('c') to the location parameter ('loc')." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "# initialize a GEV distribution\n", "law_ns = sd.GEV()\n", "# fit the GEV to the data, while specifying that the location parameter ('loc') is meant to be a covariate ('_c') of the time axis (data.index)\n", "law_ns.fit(data.ssh.values, c_loc=np.arange(data.index.size))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "ef.print_law(law_ns)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Now the location parameter is of type \"Covariate\" (that is, it *covaries* with the array we provided to `c_loc`), and it now consists of *two* coefficients (intercept and slope) of the linear relationship between the location parameter and the provided array." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "As observed, we established a linear relationship by associating the location parameter with a simple array of sequential numbers (1, 2, 3, 4, 5, 6, etc.). Alternatively, if we had used an exponential function of time (e.g., 1, 2, 4, 8, 16, by setting `c_loc = np.exp(np.arange(data.index.size)/data.index.size)`), we would have created an exponential relationship.\n", "\n", "Similarly, squaring the array (`c_loc = np.arange(data.index.size)**2`) would have resulted in a quadratic relationship.\n", "\n", "It's worth noting that various types of relationships can be employed, and this approach can be extended to multiple parameters by utilizing `c_scale` and `c_shape`. For instance, it's possible to establish a relationship with the global mean CO2 concentration rather than solely relying on time, which could potentially enhance the fitting accuracy." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Since the location parameter is not time-dependent, we are unable to create a return level/return period plot as we did in the previous tutorials. This would require 93 separate plots, one for each year. However, there is an alternative approach that is visually appealing. We can calculate the level of the 100-year event over time and overlay it onto our SSH (Sea Surface Height) record. This is referred to as the \"**effective return levels**\".\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "fig, ax = plt.subplots()\n", "data.ssh.plot(c=\"k\", label=\"sea level anomaly\", ax=ax)\n", "ax.plot(\n", " data.index, estimate_return_level(1 - 1 / 2, law_ns), label=\"2-year return level\"\n", ")\n", "ax.plot(\n", " data.index, estimate_return_level(1 - 1 / 10, law_ns), label=\"10-year return level\"\n", ")\n", "ax.plot(\n", " data.index, estimate_return_level(1 - 1 / 50, law_ns), label=\"50-year return level\"\n", ")\n", "ax.plot(\n", " data.index,\n", " estimate_return_level(1 - 1 / 100, law_ns),\n", " label=\"100-year return level\",\n", ")\n", "ax.plot(\n", " data.index,\n", " estimate_return_level(1 - 1 / 500, law_ns),\n", " label=\"500-year return level\",\n", ")\n", "ax.legend()\n", "ax.grid(True)\n", "ax.set_ylabel(\"Annual Maximum Sea Surface \\nHeight Anomaly (mm)\")\n", "ax.set_xlabel(\"Years\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "You can evaluate the quality of your model fit by examining the AIC value. \n", "The [Akaike Information Criterion (AIC)]( https://en.wikipedia.org/wiki/Akaike_information_criterion) is useful to estimate how well the model fits the data, while preferring simpler models with fewer free parameters. A lower AIC value indicates a better fit. In the code provided below, a simple function is defined to calculate the AIC for a given fitted *model*." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "def compute_aic(model):\n", " return 2 * len(model.coef_) + 2 * model.info_.mle_optim_result.fun" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "compute_aic(law_ns)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "For comparison, let's repeat the computation without the covariate by dropping the c_loc option. Then we can compute the AIC for the stationary model, and compare how well the model fits the data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "law_stat = sd.GEV()\n", "law_stat.fit(data.ssh.values)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "tags": [] }, "outputs": [], "source": [ "compute_aic(law_stat)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "You can see that AIC is lower for the model that has the location parameter as a *covariate* - this suggests this model fits the data better." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## **Coding Exercises 1** \n", "\n", "Scale and/or shape as function of time\n", "You have seen above how to make the location parameter a function of time, by providing the keyword `c_loc` to the `fit()` function.\n", "1. Now repeat this procedure, making scale, shape, or both parameters a function of time. To do this, you need to initialize a new GEV instance, call it e.g. `law_ns_scale = sd.GEV()` and then call the `.fit()` function similarly to before, but replace `c_loc` with `c_scale` and/or `c_shape`. Plot the effective return levels by providing your fitted model to the `estimate_return_level()` function, as done previously. Observe if the model fits the data better, worse, or remains the same.\n", "2. Lastly, compute the AIC for your new fits. Compare the AIC values to determine if the expectation from the previous step is met." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# initialize a GEV distribution\n", "law_ns_scale = ...\n", "\n", "# fit the GEV to the data, while specifying that the scale parameter ('scale') is meant to be a covariate ('_c') of the time axis (data.index)\n", "_ = ...\n", "\n", "# plot results\n", "fig, ax = plt.subplots()\n", "data.ssh.plot(c=\"k\", ax=ax)\n", "# Uncomment and plot\n", "# ax.plot(\n", "# ..., ..., label=\"2-year return level\"\n", "# )\n", "# ax.plot(\n", "# ..., ..., label=\"10-year return level\"\n", "# )\n", "# ax.plot(\n", "# ..., ..., label=\"50-year return level\"\n", "# )\n", "# ax.plot(\n", "# ...,\n", "# ...,\n", "# label=\"100-year return level\",\n", "# )\n", "# ax.plot(\n", "# ...,\n", "# ...,\n", "# label=\"500-year return level\",\n", "# )\n", "ax.legend()\n", "ax.grid(True)\n", "ax.set_title(\"Scale as Function of Time\")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {}, "tags": [] }, "source": [ "[*Click for solution*](https://github.com/ClimateMatchAcademy/course-content/tree/main/tutorials/W2D4_ClimateResponse-Extremes&Variability/solutions/W2D4_Tutorial7_Solution_0f7f60c3.py)\n", "\n", "*Example output:*\n", "\n", "Solution hint\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# initialize a GEV distribution\n", "law_ns_shape = ...\n", "\n", "# fit the GEV to the data, while specifying that the shape parameter ('shape') is meant to be a covariate ('_c') of the time axis (data.index)\n", "_ = ...\n", "\n", "# plot results\n", "fig, ax = plt.subplots()\n", "data.ssh.plot(c=\"k\", ax=ax)\n", "# Uncomment and plot\n", "# ax.plot(\n", "# ..., ..., label=\"2-year return level\"\n", "# )\n", "# ax.plot(\n", "# ..., ..., label=\"10-year return level\"\n", "# )\n", "# ax.plot(\n", "# ..., ..., label=\"50-year return level\"\n", "# )\n", "# ax.plot(\n", "# ...,\n", "# ...,\n", "# label=\"100-year return level\",\n", "# )\n", "# ax.plot(\n", "# ...,\n", "# ...,\n", "# label=\"500-year return level\",\n", "# )\n", "ax.legend()\n", "ax.grid(True)\n", "ax.set_title(\"Scale as Function of Time\")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {}, "tags": [] }, "source": [ "[*Click for solution*](https://github.com/ClimateMatchAcademy/course-content/tree/main/tutorials/W2D4_ClimateResponse-Extremes&Variability/solutions/W2D4_Tutorial7_Solution_646d5c1a.py)\n", "\n", "*Example output:*\n", "\n", "Solution hint\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# initialize a GEV distribution\n", "law_ns_loc_scale = ...\n", "\n", "# fit the GEV to the data using c_loc and c_scale\n", "_ = ...\n", "\n", "# plot results\n", "fig, ax = plt.subplots()\n", "data.ssh.plot(c=\"k\", ax=ax)\n", "# Uncomment and plot\n", "# ax.plot(\n", "# ..., ..., label=\"2-year return level\"\n", "# )\n", "# ax.plot(\n", "# ..., ..., label=\"10-year return level\"\n", "# )\n", "# ax.plot(\n", "# ..., ..., label=\"50-year return level\"\n", "# )\n", "# ax.plot(\n", "# ...,\n", "# ...,\n", "# label=\"100-year return level\",\n", "# )\n", "# ax.plot(\n", "# ...,\n", "# ...,\n", "# label=\"500-year return level\",\n", "# )\n", "ax.legend()\n", "ax.grid(True)\n", "ax.set_title(\"Scale as Function of Time\")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {}, "tags": [] }, "source": [ "[*Click for solution*](https://github.com/ClimateMatchAcademy/course-content/tree/main/tutorials/W2D4_ClimateResponse-Extremes&Variability/solutions/W2D4_Tutorial7_Solution_15769e5d.py)\n", "\n", "*Example output:*\n", "\n", "Solution hint\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "aics = pd.Series(\n", " index=[\"Location\", \"Scale\", \"Shape\", \"Location and Scale\"],\n", " data=[\n", " ...,\n", " ...,\n", " ...,\n", " ...\n", " ],\n", ")\n", "\n", "aics" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {}, "tags": [] }, "source": [ "[*Click for solution*](https://github.com/ClimateMatchAcademy/course-content/tree/main/tutorials/W2D4_ClimateResponse-Extremes&Variability/solutions/W2D4_Tutorial7_Solution_b1c9dbd4.py)\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# **Summary**\n", "\n", "In this tutorial, you adapted the GEV model to better fit non-stationary sea surface height data from Washington DC. You learned how to:\n", "\n", "- Modify the GEV distribution to accommodate a time-dependent parameter, enabling non-stationarity.\n", "- Compare different models based on their time-dependent parameters.\n", "- Calculate effective return levels with the non-stationary GEV model.\n", "- Assessed the fit of the model using the Akaike Information Criterion (AIC)." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# **Resources**\n", "\n", "Original data from this tutorial can be found [here](https://climexp.knmi.nl/getsealev.cgi?id=someone@somewhere&WMO=360&STATION=WASHINGTON_DC&extraargs=). Note the data used in this tutorial were preprocessed to extract the annual maximum values." ] } ], "metadata": { "colab": { "collapsed_sections": [], "include_colab_link": true, "name": "W2D4_Tutorial7", "toc_visible": true }, "kernel": { "display_name": "Python 3", "language": "python", "name": "python3" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 4 }