{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ClimateMatchAcademy/course-content/blob/main/tutorials/W2D4_ClimateResponse-Extremes&amp;Variability/W2D4_Tutorial4.ipynb)   <a href=\"https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/ClimateMatchAcademy/course-content/main/tutorials/W2D4_ClimateResponse-Extremes&amp;Variability/W2D4_Tutorial4.ipynb\" target=\"_blank\"><img alt=\"Open in Kaggle\" src=\"https://kaggle.com/static/images/open-in-kaggle.svg\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# **Tutorial 4: Return Levels Using Normal and GEV Distributions**\n",
    "\n",
    "**Week 2, Day 4, Extremes & Vulnerability**\n",
    "\n",
    "**Content creators:** Matthias Aengenheyster, Joeri Reinders\n",
    "\n",
    "**Content reviewers:** Younkap Nina Duplex, Sloane Garelick, Zahra Khodakaramimaghsoud, Peter Ohue, Laura Paccini, Jenna Pearson, Agustina Pesce, Derick Temfack, Peizhen Yang, Cheng Zhang, Chi Zhang, Ohad Zivan\n",
    "\n",
    "**Content editors:** Jenna Pearson, Chi Zhang, Ohad Zivan\n",
    "\n",
    "**Production editors:** Wesley Banfield, Jenna Pearson, Chi Zhang, Ohad Zivan\n",
    "\n",
    "**Our 2023 Sponsors:** NASA TOPS and Google DeepMind"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# **Tutorial Objectives**"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now that we have learned how to create a PDF for the GEV distribution, we can utilize it to calculate return values based on this distribution.\n",
    "\n",
    "Return levels are computed from a PDF using the [cumulative distribution function (CDF)](https://en.wikipedia.org/wiki/Cumulative_distribution_function), which provides the probability that a randomly drawn variable from our distribution will be less than a certain value X (for our specific variable). Extreme events are rarer, and thus the probability of a precipitation event being at or lower than it is high.  For example, if there is a 99% chance of observing a storm with 80mm of rainfall or lower, it means that there is a 1% chance of observing a storm with at least 80mm of rainfall. This implies that the 100-year storm would bring 80mm of rain. In simple terms, the return level is the inverse of the CDF.\n",
    "\n",
    "By the end of this tutorial, you will be able to:\n",
    "\n",
    "- Estimate return levels using a given quantile and the parameters of a GEV distribution.\n",
    "- Compare return level plots for GEV and normal distributions."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# **Setup**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 2204,
     "status": "ok",
     "timestamp": 1681924180496,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# imports\n",
    "import xarray as xr\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import pandas as pd\n",
    "import os\n",
    "import pooch\n",
    "import tempfile\n",
    "from scipy import stats\n",
    "from scipy.stats import genextreme as gev"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Figure Settings\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Figure Settings\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Figure Settings\n",
    "import ipywidgets as widgets  # interactive display\n",
    "\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "plt.style.use(\n",
    "    \"https://raw.githubusercontent.com/ClimateMatchAcademy/course-content/main/cma.mplstyle\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Video 1: Speaker Introduction\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Video 1: Speaker Introduction\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 1: Speaker Introduction\n",
    "# Tech team will add code to format and display the video"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "tags": []
   },
   "outputs": [],
   "source": [
    "# helper functions\n",
    "\n",
    "\n",
    "def pooch_load(filelocation=None, filename=None, processor=None):\n",
    "    shared_location = \"/home/jovyan/shared/Data/tutorials/W2D4_ClimateResponse-Extremes&Variability\"  # this is different for each day\n",
    "    user_temp_cache = tempfile.gettempdir()\n",
    "\n",
    "    if os.path.exists(os.path.join(shared_location, filename)):\n",
    "        file = os.path.join(shared_location, filename)\n",
    "    else:\n",
    "        file = pooch.retrieve(\n",
    "            filelocation,\n",
    "            known_hash=None,\n",
    "            fname=os.path.join(user_temp_cache, filename),\n",
    "            processor=processor,\n",
    "        )\n",
    "\n",
    "    return file"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# **Section 1: Return Levels and Return Periods**"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "As before let's load the annual maximum precipitation data from Germany:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 530,
     "status": "ok",
     "timestamp": 1681924181024,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# download file: 'precipitationGermany_1920-2022.csv'\n",
    "filename_precipitationGermany = \"precipitationGermany_1920-2022.csv\"\n",
    "url_precipitationGermany = \"https://osf.io/xs7h6/download\"\n",
    "data = pd.read_csv(\n",
    "    pooch_load(url_precipitationGermany, filename_precipitationGermany), index_col=0\n",
    ").set_index(\"years\")\n",
    "data.columns = [\"precipitation\"]\n",
    "precipitation = data.precipitation"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "And fit the General Extreme Value (GEV) distribution:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 267,
     "status": "ok",
     "timestamp": 1681924238489,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "shape, loc, scale = gev.fit(precipitation.values, 0)\n",
    "shape, loc, scale"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "To compute return levels, you can use the function estimate_return_level(x, location, scale, shape) defined below. In this case, x represents the _quantile_, the probability of a random value from our distribution being lower. For example, for the 100-year storm, x would be 0.99, and for the 1000-year storm, it would be 0.999.\n",
    "\n",
    "This utility function takes a given *quantile* and the GEV parameters (loc, scale, shape) and computes the corresponding return level."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "tags": []
   },
   "outputs": [],
   "source": [
    "def estimate_return_level(quantile, loc, scale, shape):\n",
    "    level = loc + scale / shape * (1 - (-np.log(quantile)) ** (shape))\n",
    "    return level"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now, let's utilize this function to calculate the 2-year return level, which represents the precipitation level we anticipate with a 50% chance each year. The function requires the quantile value 'x' and three parameters related to GEV (Generalized Extreme Value) distribution: location, scale, and shape. The function will provide the return level corresponding to the given quantile.\n",
    "For instance, the 90th quantile (x=0.9) would yield the precipitation value expected once every 10 years (with a 10% chance of observing it each year)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 7,
     "status": "ok",
     "timestamp": 1681924241704,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "estimate_return_level(0.5, loc, scale, shape)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "This function is also available as part of the scipy GEV implementation - here called the \"Percent point function\" or ppf which you used previously to make the quantile-quantile plots in the last tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "tags": []
   },
   "outputs": [],
   "source": [
    "gev.ppf(0.5, shape, loc=loc, scale=scale)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "To make things easier, let's replace x with the formula 1-(1/return period). Note this is different than the exceedence probability discussed in tutorial 2. So a 100-year storm would, we would use 1-(1/100) = 0.99. Compute return level for the 100- and 1000-year storm. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 188,
     "status": "ok",
     "timestamp": 1681924244959,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "estimate_return_level(1 - 1 / 100, loc, scale, shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 2,
     "status": "ok",
     "timestamp": 1681924245377,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "estimate_return_level(1 - 1 / 1000, loc, scale, shape)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now we will make a figure in which we plot return-levels against return-periods:\n",
    "1. Create a vector “periods” which is a sequence of 2 to a 1000-years (with a step size of 2).\n",
    "2. Calculate the associated return-levels for those return periods.\n",
    "3. Plot the return levels against the return periods. Typically, the return periods go on the x-axis at a log-scale.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 784,
     "status": "ok",
     "timestamp": 1681924247138,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "periods = np.arange(2, 1000, 2)\n",
    "quantiles = 1 - 1 / periods\n",
    "levels = estimate_return_level(quantiles, loc, scale, shape)\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.plot(periods, levels, \".-\")\n",
    "ax.set_xlabel(\"Return Period (years)\")\n",
    "ax.set_ylabel(\"Return Level (mm/day)\")\n",
    "ax.set_xscale(\"log\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## **Coding Exercise 1**\n",
    "\n",
    "Compute and plot the 50-,100-,500- and 1000-year return levels based on a normal distribution and based on the Normal and GEV distributions.\n",
    "\n",
    "Plot these three things:\n",
    "1. the empirical return level\n",
    "2. the estimate based on a normal distribution\n",
    "3. the estimate based on a GEV distribution\n",
    "\n",
    "*Note that you can get the empirical return levels from tutorial 2, we will provide this code below.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "tags": []
   },
   "outputs": [],
   "source": [
    "def empirical_return_level(data):\n",
    "    \"\"\"\n",
    "    Compute empirical return level using the algorithm introduced in Tutorial 2\n",
    "    \"\"\"\n",
    "    df = pd.DataFrame(index=np.arange(data.size))\n",
    "    # sort the data\n",
    "    df[\"sorted\"] = np.sort(data)[::-1]\n",
    "    # rank via scipy instead to deal with duplicate values\n",
    "    df[\"ranks_sp\"] = np.sort(stats.rankdata(-data))\n",
    "    # find exceedence probability\n",
    "    n = data.size\n",
    "    df[\"exceedance\"] = df[\"ranks_sp\"] / (n + 1)\n",
    "    # find return period\n",
    "    df[\"period\"] = 1 / df[\"exceedance\"]\n",
    "\n",
    "    df = df[::-1]\n",
    "\n",
    "    out = xr.DataArray(\n",
    "        dims=[\"period\"],\n",
    "        coords={\"period\": df[\"period\"]},\n",
    "        data=df[\"sorted\"],\n",
    "        name=\"level\",\n",
    "    )\n",
    "    return out"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# setup plots\n",
    "fig, ax = plt.subplots()\n",
    "\n",
    "# get empirical return levels\n",
    "_ = ...\n",
    "\n",
    "# create vector of years\n",
    "years = np.arange(1.1, 100, 0.1)\n",
    "\n",
    "# calculate and plot the normal return levels\n",
    "_ = ...\n",
    "\n",
    "# calculate and plot the GEV distribution, note the negtive shape parameter\n",
    "_ = ...\n",
    "\n",
    "# set x axis to log scale\n",
    "ax.set_xscale(\"log\")\n",
    "\n",
    "# show legend\n",
    "ax.legend([\"empirical\", \"normal\", \"GEV\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {},
    "executionInfo": {
     "elapsed": 357,
     "status": "ok",
     "timestamp": 1681924249195,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "source": [
    "[*Click for solution*](https://github.com/ClimateMatchAcademy/course-content/tree/main/tutorials/W2D4_ClimateResponse-Extremes&Variability/solutions/W2D4_Tutorial4_Solution_8207985e.py)\n",
    "\n",
    "*Example output:*\n",
    "\n",
    "<img alt='Solution hint' align='left' width=775.0 height=575.0 src=https://raw.githubusercontent.com/ClimateMatchAcademy/course-content/main/tutorials/W2D4_ClimateResponse-Extremes&Variability/static/W2D4_Tutorial4_Solution_8207985e_1.png>\n",
    "\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## **Question 1**\n",
    "\n",
    "1. What can you say about the plot and how the distributions describe the data? How do the distributions differ? At short/long return periods? What will happen at even longer return periods?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/ClimateMatchAcademy/course-content/tree/main/tutorials/W2D4_ClimateResponse-Extremes&Variability/solutions/W2D4_Tutorial4_Solution_9573c306.py)\n",
    "\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## **Bonus: Find Confidence Intervals**\n",
    "\n",
    "When compute return levels of extreme events - or really when doing any quantitivate assessment in science - it is useful and often essential to provide uncertainty estimates, that is how much the result may be expected to fluctuate around the central estimate. \n",
    "\n",
    "In Tutorial 2 the exercise discussed one approach to construct such uncertainty ranges - bootstrapping. You can now extend this method by resampling the data, and re-fitting the GEV, and in doing so generate uncertainty estimates of the GEV parameters and GEV-based return levels.\n",
    "\n",
    "One possible approach could involve the following steps:\n",
    "\n",
    "1. Begin with the `gev` function which can be employed to fit GEV parameters using `gev.fit(data)`. This function returns the three parameters, taking note of a sign change in the shape parameter (as different conventions exist).\n",
    "2. Determine the number of observations in the data, denoted as N.\n",
    "3. Perform resampling by randomly drawing N samples from the data with replacement. This process generates an artificial ensemble that differs from the true data due to resampling.\n",
    "4. Estimate the parameters for each resampling.\n",
    "5. Utilize the `gev.ppf` function for each parameter set to compute the return level.\n",
    "6. Visualize the resulting uncertainty bands by plotting multiple lines or calculating confidence intervals using `np.quantile`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# initalize list to store parameters from samples\n",
    "params = []\n",
    "\n",
    "# generate 1000 samples by resampling data with replacement\n",
    "for i in range(1000):\n",
    "    ...\n",
    "\n",
    "# print the estimate of the mean of each parameter and it's confidence intervals\n",
    "print(\n",
    "    \"Mean estimate: \",\n",
    "    np.mean(np.array(params), axis=0),\n",
    "    \" and 95% confidence intervals: \",\n",
    "    ...,\n",
    ")\n",
    "\n",
    "# generate years vector\n",
    "years = np.arange(1.1, 1000, 0.1)\n",
    "\n",
    "# intialize list for return levels\n",
    "levels = []\n",
    "\n",
    "# calculate return levels for each of the 1000 samples\n",
    "for i in range(1000):\n",
    "    levels.append(...)\n",
    "levels = np.array(levels)\n",
    "\n",
    "# setup plots\n",
    "fig, ax = plt.subplots()\n",
    "\n",
    "# find empirical return levels\n",
    "_ = ...\n",
    "\n",
    "# plot return mean levels\n",
    "_ = ...\n",
    "\n",
    "# plot confidence intervals\n",
    "_ = ...\n",
    "\n",
    "# aesthetics\n",
    "ax.set_xlim(1.5, 1000)\n",
    "ax.set_ylim(20, 110)\n",
    "ax.set_xscale(\"log\")\n",
    "ax.set_xlabel(\"Return Period (years)\")\n",
    "ax.set_ylabel(\"Return Level (mm/day)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {},
    "tags": []
   },
   "source": [
    "[*Click for solution*](https://github.com/ClimateMatchAcademy/course-content/tree/main/tutorials/W2D4_ClimateResponse-Extremes&Variability/solutions/W2D4_Tutorial4_Solution_c4127f14.py)\n",
    "\n",
    "*Example output:*\n",
    "\n",
    "<img alt='Solution hint' align='left' width=775.0 height=575.0 src=https://raw.githubusercontent.com/ClimateMatchAcademy/course-content/main/tutorials/W2D4_ClimateResponse-Extremes&Variability/static/W2D4_Tutorial4_Solution_c4127f14_2.png>\n",
    "\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# **Summary**\n",
    "In this tutorial, you learned how to calculate return values based on the Generalized Extreme Value (GEV) distribution. You learned how they are derived the cumulative distribution function (CDF), and compared them to those from the normal distribution. Finally, you learned how to create confidence intervals for the parameters that define the GEV distribution for a given dataset."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# **Resources**\n",
    "\n",
    "Data from this tutorial uses the 0.25 degree precipitation dataset E-OBS. It combines precipitation observations to generate a gridded (i.e. no \"holes\") precipitation over Europe. We used the precipitation data from the gridpoint at 51 N, 6 E. \n",
    "\n",
    "The dataset can be accessed using the KNMI Climate Explorer [here](https://climexp.knmi.nl/select.cgi?id=someone@somewhere&field=ensembles_025_rr). The Climate Explorer is a great resource to access, manipulate and visualize climate data, including observations and climate model simulations. It is freely accessible - feel free to explore!"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "name": "W2D4_Tutorial4",
   "toc_visible": true,
   "version": ""
  },
  "kernel": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}