{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ClimateMatchAcademy/course-content/blob/main/tutorials/W2D1_FutureClimate-IPCCIPhysicalBasis/student/W2D1_Tutorial3.ipynb) &nbsp; <a href=\"https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/ClimateMatchAcademy/course-content/main/tutorials/W2D1_FutureClimate-IPCCIPhysicalBasis/student/W2D1_Tutorial3.ipynb\" target=\"_parent\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" alt=\"Open in Kaggle\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Tutorial 3: Multi-model Ensembles\n",
    "\n",
    "**Week 2, Day 1, Future Climate: The Physical Basis**\n",
    "\n",
    "**Content creators:** Brodie Pearson, Julius Busecke, Tom Nicholas\n",
    "\n",
    "**Content reviewers:** Younkap Nina Duplex, Zahra Khodakaramimaghsoud, Sloane Garelick, Peter Ohue, Jenna Pearson, Derick Temfack, Peizhen Yang, Cheng Zhang, Chi Zhang, Ohad Zivan\n",
    "\n",
    "**Content editors:** Jenna Pearson, Ohad Zivan, Chi Zhang\n",
    "\n",
    "**Production editors:** Wesley Banfield, Jenna Pearson, Chi Zhang, Ohad Zivan\n",
    "\n",
    "**Our 2023 Sponsors:** NASA TOPS, Google DeepMind, and CMIP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Tutorial Objectives\n",
    "In this tutorial, we will analyze datafrom five different CMIP6 models developed by various institutions around the world, focusing on their *historical* simulations and low-emission projections (*SSP1-2.6*). By the end of this tutorial, you will be able to \n",
    "- Load CMIP6 Sea Surface Temperature (SST) data from multiple models;\n",
    "- Calculate the SST anomalies, and recall the concept of temperature anomaly in relation to a base period"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Setup\n",
    "\n",
    "    \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 100078,
     "status": "ok",
     "timestamp": 1683926800928,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": [
     "colab"
    ]
   },
   "outputs": [],
   "source": [
    "# installations ( uncomment and run this cell ONLY when using google colab or kaggle )\n",
    "\n",
    "# !pip install condacolab &> /dev/null\n",
    "# import condacolab\n",
    "# condacolab.install()\n",
    "\n",
    "# # Install all packages in one call (+ use mamba instead of conda), this must in one line or code will fail\n",
    "# !mamba install xarray-datatree intake-esm gcsfs xmip aiohttp nc-time-axis cf_xarray xarrayutils &> /dev/null"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 4018,
     "status": "ok",
     "timestamp": 1683926879380,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# imports\n",
    "import time\n",
    "\n",
    "tic = time.time()\n",
    "\n",
    "import intake\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import xarray as xr\n",
    "\n",
    "from xmip.preprocessing import combined_preprocessing\n",
    "from xarrayutils.plotting import shaded_line_plot\n",
    "\n",
    "from datatree import DataTree\n",
    "from xmip.postprocessing import _parse_metric"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": []
   },
   "outputs": [],
   "source": [
    "# @title Figure settings\n",
    "import ipywidgets as widgets  # interactive display\n",
    "\n",
    "plt.style.use(\n",
    "    \"https://raw.githubusercontent.com/ClimateMatchAcademy/course-content/main/cma.mplstyle\"\n",
    ")\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": []
   },
   "outputs": [],
   "source": [
    "# @title Helper functions\n",
    "\n",
    "\n",
    "def global_mean(ds: xr.Dataset) -> xr.Dataset:\n",
    "    \"\"\"Global average, weighted by the cell area\"\"\"\n",
    "    return ds.weighted(ds.areacello.fillna(0)).mean([\"x\", \"y\"], keep_attrs=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": []
   },
   "outputs": [],
   "source": [
    "# @title Video 1: Why So Many Earth System Models?\n",
    "\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "\n",
    "video_ids = [('Youtube', 'sTYK2TAPycs'), ('Bilibili', 'BV1Ua4y1F78Z')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "pycharm": {
     "name": "#%%\n"
    },
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import IFrame\n",
    "\n",
    "link_id = \"6y2fp\"\n",
    "\n",
    "download_link = f\"https://osf.io/download/{link_id}/\"\n",
    "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n",
    "# @markdown\n",
    "out = widgets.Output()\n",
    "with out:\n",
    "    print(f\"If you want to download the slides: {download_link}\")\n",
    "    display(IFrame(src=f\"{render_link}\", width=730, height=410))\n",
    "display(out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Section 1: Load CMIP6 SST Data from Five Models and Three Experiments\n",
    "In the previous section, you compared how a single CMIP6 model (*TaiESM1*) simulated past temperature, and how it projected temperature would change under a low-emissions scenario and a high-emissions scenario (*historical*, *SSP1-2.6* and *SSP5-8.5* experiments respectively). \n",
    "\n",
    "Now we will start to analyze a **multi-model ensemble** that consists of data from multiple CMIP6 models. For now, we will focus on just the historical simulation and the low-emissions projection.\n",
    "\n",
    "Your multi-model ensemble will consist of **Five** different CMIP6 models developed by institutions from a variety of countries: \n",
    "  * *TaiESM1* (Research Center for Environmental Changes, Taiwan),\n",
    "  * *IPSL-CM6A-LR* (Institut Pierre Simon Laplace, France),\n",
    "  * *GFDL-ESM4* (NOAA Geophysical Fluid Dynamics Laboratory, USA), \n",
    "  * *ACCESS-CM2* (CSIRO and ARCCSS, Australia), and \n",
    "  * *MPI-ESM1-2-LR* (Max Planck Institute, Germany). \n",
    "\n",
    "We can specify these models through the *source_id* facet. You can see a [table of CMIP6 *source_id* values here](https://www.google.com/url?q=https://airtable.com/shrvRybShvNSE1Szp/tblC0DBPiCm7gjJqx&sa=D&source=docs&ust=1689235930798709&usg=AOvVaw06jpHwjNGxPR1yILwtX5qB), and you can learn more about CMIP through our [CMIP Resource Bank](https://github.com/ClimateMatchAcademy/course-content/blob/main/tutorials/CMIP/CMIP_resource_bank.md) and the [CMIP website](https://wcrp-cmip.org/).\n",
    "\n",
    "Note that these are only a subset of the dozens of models, institutes, countries, and experiments that contribute to CMIP6, as discussed in the previous W2D1 Tutorial 2 video. Some of the abbreviations in the model names refer to institutes (*MPI/GFDL*), while some refer to the complexity and version of the model (e.g., Earth System or Climate Model [*ESM/CM*] and low- or high-resolution [*LR/HR*]). There are often several models from a single institute with each having a distinct level of complexity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "tags": []
   },
   "outputs": [],
   "source": [
    "col = intake.open_esm_datastore(\n",
    "    \"https://storage.googleapis.com/cmip6/pangeo-cmip6.json\"\n",
    ")  # open an intake catalog containing the Pangeo CMIP cloud data\n",
    "\n",
    "# pick our five models and three experiments\n",
    "# there are many more to test out! Try executing `col.df['source_id'].unique()` to get a list of all available models\n",
    "source_ids = [\"IPSL-CM6A-LR\", \"GFDL-ESM4\", \"ACCESS-CM2\", \"MPI-ESM1-2-LR\", \"TaiESM1\"]\n",
    "experiment_ids = [\"historical\", \"ssp126\", \"ssp585\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 11206,
     "status": "ok",
     "timestamp": 1683162582065,
     "user": {
      "displayName": "Sloane Garelick",
      "userId": "04706287370408131987"
     },
     "user_tz": 240
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# from the full `col` object, create a subset using facet search\n",
    "cat = col.search(\n",
    "    source_id=source_ids,\n",
    "    variable_id=\"tos\",\n",
    "    member_id=\"r1i1p1f1\",\n",
    "    table_id=\"Omon\",\n",
    "    grid_label=\"gn\",\n",
    "    experiment_id=experiment_ids,\n",
    "    require_all_on=[\n",
    "        \"source_id\"\n",
    "    ],  # make sure that we only get models which have all of the above experiments\n",
    ")\n",
    "\n",
    "# convert the sub-catalog into a datatree object, by opening each dataset into an xarray.Dataset (without loading the data)\n",
    "kwargs = dict(\n",
    "    preprocess=combined_preprocessing,  # apply xMIP fixes to each dataset\n",
    "    xarray_open_kwargs=dict(\n",
    "        use_cftime=True\n",
    "    ),  # ensure all datasets use the same time index\n",
    "    storage_options={\n",
    "        \"token\": \"anon\"\n",
    "    },  # anonymous/public authentication to google cloud storage\n",
    ")\n",
    "\n",
    "cat.esmcat.aggregation_control.groupby_attrs = [\"source_id\", \"experiment_id\"]\n",
    "dt = cat.to_datatree(**kwargs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 4341,
     "status": "ok",
     "timestamp": 1683162592002,
     "user": {
      "displayName": "Sloane Garelick",
      "userId": "04706287370408131987"
     },
     "user_tz": 240
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "cat_area = col.search(\n",
    "    source_id=source_ids,\n",
    "    variable_id=\"areacello\",  # for the coding exercise, ellipses will go after the equals on this line\n",
    "    member_id=\"r1i1p1f1\",\n",
    "    table_id=\"Ofx\",  # for the coding exercise, ellipses will go after the equals on this line\n",
    "    grid_label=\"gn\",\n",
    "    experiment_id=[\n",
    "        \"historical\"\n",
    "    ],  # for the coding exercise, ellipses will go after the equals on this line\n",
    "    require_all_on=[\"source_id\"],\n",
    ")\n",
    "\n",
    "cat_area.esmcat.aggregation_control.groupby_attrs = [\"source_id\", \"experiment_id\"]\n",
    "dt_area = cat_area.to_datatree(**kwargs)\n",
    "\n",
    "dt_with_area = DataTree()\n",
    "\n",
    "for model, subtree in dt.items():\n",
    "    metric = dt_area[model][\"historical\"].ds[\"areacello\"]\n",
    "    dt_with_area[model] = subtree.map_over_subtree(_parse_metric, metric)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GlnGKoa95fkz"
   },
   "source": [
    "Let's first reproduce the previous tutorial's timeseries of Global Mean SST from *TaiESM1* through the historical experiment and two future emissions scenarios"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 23501,
     "status": "ok",
     "timestamp": 1683162622054,
     "user": {
      "displayName": "Sloane Garelick",
      "userId": "04706287370408131987"
     },
     "user_tz": 240
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# average every dataset in the tree globally\n",
    "dt_gm = dt_with_area.map_over_subtree(global_mean)\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "for experiment in [\"historical\", \"ssp126\", \"ssp585\"]:\n",
    "    da = dt_gm[\"TaiESM1\"][experiment].ds.tos\n",
    "    da.plot(label=experiment, ax=ax)\n",
    "ax.set_title(\"Global Mean SST from TaiESM1\")\n",
    "ax.set_ylabel(\"Global Mean SST [$^\\circ$C]\")\n",
    "ax.set_xlabel(\"Year\")\n",
    "ax.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xrhRxvgJ2XoK"
   },
   "source": [
    "### **Coding Exercise 1.1: Combine Past Data and Future Data, and Remove Seasonal Oscillations, and plot all 5 of the CMIP6 models we just loaded**\n",
    "\n",
    "* The historical and projected data are separate time series. Complete the `xr.concat` function to combine the historical and projected data into a single continuous time series for each model.\n",
    "* The previous timeseries oscillated very rapidly due to Earth's seasonal cycles. Finish the `xarray` `resample` function so that it smooths the monthly data with a one-year running mean. This will make it easier to distinguish the long-term changes in sea surface temperature.\n",
    "* Note: this code is already set up to use all 5 of the CMIP6 models you just loaded, as it loops through and plots each model in the DataTree [`dt.keys()`]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 169,
     "status": "error",
     "timestamp": 1682723761601,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def plot_historical_ssp126_combined(dt, ax):\n",
    "    for model in dt.keys():\n",
    "        datasets = []\n",
    "        for experiment in [\"historical\", \"ssp126\"]:\n",
    "            datasets.append(dt[model][experiment].tos)\n",
    "\n",
    "        # for each of the models, concatenate its historical and future data\n",
    "        da_combined = xr.concat(...)\n",
    "        # plot annual averages\n",
    "        da_combined.coarsen(...).mean().plot(label=model, ax=ax)\n",
    "\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "# plot_historical_ssp126_combined\n",
    "_ = ...\n",
    "\n",
    "ax.set_title(\"Global Mean SST from five CMIP6 models (annually smoothed)\")\n",
    "ax.set_ylabel(\"Global Mean SST [$^\\circ$C]\")\n",
    "ax.set_xlabel(\"Year\")\n",
    "ax.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MhGBQUpBGFCv"
   },
   "source": [
    "### **Question 1.1**\n",
    "\n",
    "1.   Why do you think the global mean temperature varies so much between models?* \n",
    "\n",
    "**If you get stuck here, reflect on the videos from earlier today and the tutorials/videos from the Climate Modelling day for inspiration.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Coding Exercise 1.2\n",
    "\n",
    "As you just saw, the global mean SST varies between climate models. This is not surprising given the slight differences in physics, numerics, and discretization between each model.\n",
    "\n",
    "When we are looking at future projections, we care about how the model's *change* relative to their equilibrium/previous state. To do this, we typically subtract a historical reference period from the timeseries, which creates a new timeseries which we call the temperature *anomaly* relative to that period. **Recall that you already calculated and used *anomalies* earlier in the course (e.g., on W1D1).**\n",
    "\n",
    "**Modify the following code to recreate the previous multi-model figure, but now instead plot the global mean sea surface temperature (GMSST) *anomaly* relative the 1950-1980 base period (i.e., subtract the 1950-1980 mean GMSST of each model from that model's timeseries)**\n",
    "\n",
    "*Hint: you will need to use `ds.sel` to select data during the base period ([see here](https://docs.xarray.dev/en/stable/user-guide/indexing.html#indexing-with-dimension-names) under \"Indexing with dimension names\" for a helpful example) along with the averaging operator, `mean()`.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 130,
     "status": "error",
     "timestamp": 1682723868376,
     "user": {
      "displayName": "Brodie Pearson",
      "userId": "05269028596972519847"
     },
     "user_tz": 420
    }
   },
   "outputs": [],
   "source": [
    "# calculate anomaly to reference period\n",
    "def datatree_anomaly(dt):\n",
    "    dt_out = DataTree()\n",
    "    for model, subtree in dt.items():\n",
    "        # find the temporal average over the desired reference period\n",
    "        ref = ...\n",
    "        dt_out[model] = subtree - ref\n",
    "    return dt_out\n",
    "\n",
    "\n",
    "dt_gm_anomaly = datatree_anomaly(dt_gm)\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "plot_historical_ssp126_combined(dt_gm_anomaly, ax)\n",
    "\n",
    "ax.set_title(\n",
    "    \"Global Mean SST Anomaly from five CMIP6 models (base period: 1950 to 1980)\"\n",
    ")\n",
    "ax.set_ylabel(\"Global Mean SST Anomaly [$^\\circ$C]\")\n",
    "ax.set_xlabel(\"Year\")\n",
    "ax.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "UYSuNDHE5qRr"
   },
   "source": [
    "### **Questions 1.2: Climate Connection**\n",
    "\n",
    "1.  How does this figure compare to the previous one where the reference period was not subtracted?\n",
    "2.  This figure uses the reference period of 1950-1980 for its anomaly calculation. How does the variability across models 100 years before the base period (1850) compare to the variability 100 years after the base period (2080)? Why do you think this is?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Summary\n",
    "\n",
    "In this tutorial, we expanded on the previous examination of a single CMIP6 model (*TaiESM1*), to instead work with multi-model ensemble comprising five different CMIP6 models (*TaiESM1*, *IPSL-CM6A-LR*, *GFDL-ESM4*, *ACCESS-CM2*, and *MPI-ESM1-2-LR*) developed by institutions around the world. We demonstrated the importance of anomalizing each of these models relative to a base period when calculating changes in the simulated Earth system. If variables are not anomalized, these absolute values can differ significantly between models due to the diversity of models (i.e., their distinct physics, numerics and discretization)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Resources\n",
    "\n",
    "This tutorial uses data from the simulations conducted as part of the [CMIP6](https://wcrp-cmip.org/) multi-model ensemble. \n",
    "\n",
    "For examples on how to access and analyze data, please visit the [Pangeo Cloud CMIP6 Gallery](https://gallery.pangeo.io/repos/pangeo-gallery/cmip6/index.html) \n",
    "\n",
    "For more information on what CMIP is and how to access the data, please see this [page](https://github.com/ClimateMatchAcademy/course-content/blob/main/tutorials/CMIP/CMIP_resource_bank.md)."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "machine_shape": "hm",
   "name": "W2D1_Tutorial_3",
   "provenance": [
    {
     "file_id": "1WfT8oN22xywtecNriLptqi1SuGUSoIlc",
     "timestamp": 1680298239014
    }
   ],
   "toc_visible": true
  },
  "gpuClass": "standard",
  "kernel": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}