{ "cells": [ { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ClimateMatchAcademy/course-content/blob/main/tutorials/W1D4_Paleoclimate/student/W1D4_Tutorial1.ipynb)   \"Open\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Tutorial 1: Paleoclimate Proxies\n", "\n", "**Week 1, Day 4, Paleoclimate**\n", "\n", "**Content creators:** Sloane Garelick\n", "\n", "**Content reviewers:** Yosmely Bermúdez, Katrina Dobson, Maria Gonzalez, Will Gregory, Nahid Hasan, Sherry Mi, Beatriz Cosenza Muralles, Brodie Pearson, Jenna Pearson, Chi Zhang, Ohad Zivan\n", "\n", "**Content editors:** Yosmely Bermúdez, Zahra Khodakaramimaghsoud, Jenna Pearson, Agustina Pesce, Chi Zhang, Ohad Zivan\n", "\n", "**Production editors:** Wesley Banfield, Jenna Pearson, Chi Zhang, Ohad Zivan\n", "\n", "**Our 2023 Sponsors:** NASA TOPS and Google DeepMind\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Tutorial Objectives\n", "\n", "In this tutorial, you'll learn about different types of paleoclimate proxies (physical characteristics of the environment that can stand in for direct measurements), the file type they come in, and how to convert these files to more usable formats.\n", "\n", "By the end of this tutorial you will be able to:\n", "\n", "- Understand some types of paleoclimate proxies and archives that exist\n", "- Create a global map of locations of proxy paleoclimate records in a specific data network\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Setup\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# installations ( uncomment and run this cell ONLY when using google colab or kaggle )\n", "\n", "# !pip install --no-binary shapely shapely --force # Add this to use cartopy. in this way it doesn't crush\n", "# !pip install cartopy\n", "# !pip install LiPD\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# imports\n", "import os\n", "import pandas as pd\n", "import numpy as np\n", "import pooch # to donwload the PAGES2K data\n", "import matplotlib.pyplot as plt\n", "import tempfile\n", "import lipd\n", "import cartopy.crs as ccrs\n", "import cartopy.feature as cfeature\n", "import cartopy.io.shapereader as shapereader\n", "from io import StringIO\n", "import sys" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Helper functions\n", "\n", "\n", "def pooch_load(filelocation=None, filename=None, processor=None):\n", " # this is different for each day\n", " shared_location = \"/home/jovyan/shared/Data/tutorials/W1D4_Paleoclimate\"\n", " user_temp_cache = tempfile.gettempdir()\n", "\n", " if os.path.exists(os.path.join(shared_location, filename)):\n", " file = os.path.join(shared_location, filename)\n", " else:\n", " file = pooch.retrieve(\n", " filelocation,\n", " known_hash=None,\n", " fname=os.path.join(user_temp_cache, filename),\n", " processor=processor,\n", " )\n", "\n", " return file\n", "\n", "\n", "# Convert the PAGES2K LiDP files into a pandas.DataFrame\n", "\n", "# Function to convert the PAGES2K LiDP files in a pandas.DataFrame\n", "\n", "\n", "def lipd2df(\n", " lipd_dirpath,\n", " pkl_filepath=None,\n", " col_str=[\n", " \"paleoData_pages2kID\",\n", " \"dataSetName\",\n", " \"archiveType\",\n", " \"geo_meanElev\",\n", " \"geo_meanLat\",\n", " \"geo_meanLon\",\n", " \"year\",\n", " \"yearUnits\",\n", " \"paleoData_variableName\",\n", " \"paleoData_units\",\n", " \"paleoData_values\",\n", " \"paleoData_proxy\",\n", " ],\n", "):\n", " \"\"\"\n", " Convert a bunch of PAGES2k LiPD files to a `pandas.DataFrame` to boost data loading.\n", "\n", " If `pkl_filepath` isn't `None`, save the DataFrame as a pikle file.\n", "\n", " Parameters:\n", " ----------\n", " lipd_dirpath: str\n", " Path of the PAGES2k LiPD files\n", " pkl_filepath: str or None\n", " Path of the converted pickle file. Default: `None`\n", " col_str: list of str\n", " Name of the variables to extract from the LiPD files\n", "\n", " Returns:\n", " -------\n", " df: `pandas.DataFrame`\n", " Converted Pandas DataFrame\n", " \"\"\"\n", "\n", " # Save the current working directory for later use, as the LiPD utility will change it in the background\n", " work_dir = os.getcwd()\n", " # LiPD utility requries the absolute path\n", " lipd_dirpath = os.path.abspath(lipd_dirpath)\n", " # Load LiPD files\n", " lipds = lipd.readLipd(lipd_dirpath)\n", " # Extract timeseries from the list of LiDP objects\n", " ts_list = lipd.extractTs(lipds)\n", " # Recover the working directory\n", " os.chdir(work_dir)\n", " # Create an empty pandas.DataFrame with the number of rows to be the number of the timeseries (PAGES2k records),\n", " # and the columns to be the variables we'd like to extract\n", " df_tmp = pd.DataFrame(index=range(len(ts_list)), columns=col_str)\n", " # Loop over the timeseries and pick those for global temperature analysis\n", " i = 0\n", " for ts in ts_list:\n", " if (\n", " \"paleoData_useInGlobalTemperatureAnalysis\" in ts.keys()\n", " and ts[\"paleoData_useInGlobalTemperatureAnalysis\"] == \"TRUE\"\n", " ):\n", " for name in col_str:\n", " try:\n", " df_tmp.loc[i, name] = ts[name]\n", " except:\n", " df_tmp.loc[i, name] = np.nan\n", " i += 1\n", " # Drop the rows with all NaNs (those not for global temperature analysis)\n", " df = df_tmp.dropna(how=\"all\")\n", " # Save the dataframe to a pickle file for later use\n", " if pkl_filepath:\n", " save_path = os.path.abspath(pkl_filepath)\n", " print(f\"Saving pickle file at: {save_path}\")\n", " df.to_pickle(save_path)\n", " return df\n", "\n", "\n", "class SupressOutputs(list):\n", " def __enter__(self):\n", " self._stdout = sys.stdout\n", " sys.stdout = self._stringio = StringIO()\n", " return self\n", "\n", " def __exit__(self, *args):\n", " self.extend(self._stringio.getvalue().splitlines())\n", " del self._stringio # free up some memory\n", " sys.stdout = self._stdout" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [] }, "outputs": [], "source": [ "# @title Video 1: What is Paleoclimate?\n", "\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == \"Bilibili\":\n", " src = f\"https://player.bilibili.com/player.html?bvid={id}&page={page}\"\n", " elif source == \"Osf\":\n", " src = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render\"\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == \"Youtube\":\n", " video = YouTubeVideo(\n", " id=video_ids[i][1], width=W, height=H, fs=fs, rel=0\n", " )\n", " print(f\"Video available at https://youtube.com/watch?v={video.id}\")\n", " else:\n", " video = PlayVideo(\n", " id=video_ids[i][1],\n", " source=video_ids[i][0],\n", " width=W,\n", " height=H,\n", " fs=fs,\n", " autoplay=False,\n", " )\n", " if video_ids[i][0] == \"Bilibili\":\n", " print(\n", " f\"Video available at https://www.bilibili.com/video/{video.id}\"\n", " )\n", " elif video_ids[i][0] == \"Osf\":\n", " print(f\"Video available at https://osf.io/{video.id}\")\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [(\"Youtube\", \"MR5Ft7q8w3M\"), (\"Bilibili\", \"BV1Bh4y1Z7H4\")]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "pycharm": { "name": "#%%\n" }, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @markdown\n", "from ipywidgets import widgets\n", "from IPython.display import IFrame\n", "\n", "link_id = \"ae894\"\n", "\n", "download_link = f\"https://osf.io/download/{link_id}/\"\n", "render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n", "# @markdown\n", "out = widgets.Output()\n", "with out:\n", " print(f\"If you want to download the slides: {download_link}\")\n", " display(IFrame(src=f\"{render_link}\", width=730, height=410))\n", "display(out)\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Section 1: Introduction to PAGES2k\n", "\n", "As we've now seen from introductory video, there are various types of [paleoclimate archives and proxies](http://wiki.linked.earth/Climate_Proxy) that can be used to reconstruct past changes in Earth's climate. For example:\n", "\n", "- **Sediment Cores**: Sediments deposited in layers within lakes and oceans serve as a record of climate variations over time. Various proxies for past climate are preserved in sediment cores including, pollen, microfossils, charcoal, microscopic organisms, organic molecules, etc.\n", "- **Ice Cores**: Similarly to sediment cores, ice cores capture past climate changes in layers of ice accumulated over time. Common proxies for reconstructing past climate in ice cores include water isotopes, greenhouse gas concentrations of air bubbles in the ice, and dust.\n", "- **Corals**: Corals form annual growth bands within their carbonate skeletons, recording temperature changes over time. Scientists analyze the chemical composition of each layer to reconstruct temperature and salinity. Corals typically preserve relatively short paleoclimate records, but they provide very high-resolution reconstructions (monthly and seasonal) and are therefore valuable for understanding past changes in short-term phenomena.\n", "- **Speleothems**: These are cave formations that result from the deposition of minerals from groundwater. As the water flows into the cave, thin layers of minerals (e.g., calcium carbonate), are deposited. The thickness and chemical composition of speleothem layers can be used to reconstruct climate changes in the past.\n", "- **Tree Rings**: Each year, trees add a new layer of growth, known as a tree ring. These rings record changes in temperature and precipitation. Proxy measurements of tree rings include thickness and isotopes, which reflect annual variability in moisture and temperature.\n", "\n", "There are many existing paleoclimate reconstructions spanning a variety of timescales and from global locations. Given the temporal and spatial vastness of existing paleoclimate records, it can be challenging to know what paleoclimate data already exists and where to find it. One useful solution is compiling all existing paleoclimate records for a single climate variable (temperature, greenhouse gas concentration, precipitation, etc.) and over a specific time period (Holocene to present, the past 800,000 years, etc.).\n", "\n", "One example of this is the **PAGES2k network**, which is a community-sourced database of temperature-sensitive proxy records. The database consists of 692 records from 648 locations, that are from a variety of archives (e.g., trees, ice, sediment, corals, speleothems, etc.) and span the Common Era (1 CE to present, i.e., the past ~2,000 years). You can read more about the PAGES2k network, in [PAGES 2k Consortium (2017)](https://www.nature.com/articles/sdata201788).\n", "\n", "In this tutorial, we will explore the types of proxy records in the PAGES2k network and create a map of proxy record locations.\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Section 2: Get PAGES2k LiPD Files\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "The PAGES2k network is stored in a specific file format known as [Linked Paleo Data format (LiPD)](http://wiki.linked.earth/Linked_Paleo_Data). LiPD files contain time series information in addition to supporting metadata (e.g., root metadata, location). Pyleoclim (and its dependency package LiPD) leverages this additional information using LiPD-specific functionality.\n", "\n", "Data stored in the .lpd format can be loaded directly as an Lipd object. If the data_path points to one LiPD file, `lipd.readLipd.()` will load the specific record, while if data_path points to a folder of lipd files, `lipd.readLipd.()` will load the full set of records. This function to read in the data is imbedded in the helper function above used to read the data in and convert it to a more usable format.\n", "\n", "The first thing we need to do it to download the data.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# Set the name to save the PAGES2K data\n", "fname = \"pages2k_data\"\n", "\n", "# Download the data\n", "lipd_file_path = pooch.retrieve(\n", " url=\"https://ndownloader.figshare.com/files/8119937\",\n", " known_hash=None,\n", " path=\"./\",\n", " fname=fname,\n", " processor=pooch.Unzip(),\n", ")\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Now we can use our helpfer function `lipd_2df()` to convert the LiPD files to a [Pandas dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).\n", "\n", "NOTE: when you run some of the next code cell to convert the Lipd files to a DataFrame, you will get some error messages. This is fine and the code will still accomplish what it needs to do. The code will also take a few minutes to run, so if it's taking longer than you'd expect, that's alright!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "executionInfo": { "elapsed": 139313, "status": "ok", "timestamp": 1680670246822, "user": { "displayName": "Brodie Pearson", "userId": "05269028596972519847" }, "user_tz": 420 } }, "outputs": [], "source": [ "# convert all the lipd files into a DataFrame\n", "fname = \"pages2k_data\"\n", "\n", "with SupressOutputs():\n", " pages2k_data = lipd2df(\n", " lipd_dirpath=os.path.join(\".\", f\"{fname}.unzip\", \"LiPD_Files\")\n", " )" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "The PAGES2k data is now stored as a dataframe and we can view the data to understand different attributes it contains.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "executionInfo": { "elapsed": 628, "status": "ok", "timestamp": 1680670283233, "user": { "displayName": "Brodie Pearson", "userId": "05269028596972519847" }, "user_tz": 420 } }, "outputs": [], "source": [ "# print the top few rows of the PAGES2K data\n", "pages2k_data.head()\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Section 3: Plotting a Map of Proxy Reconstruction Locations\n", "\n", "Now that we have converted the data into a Pandas dataframe, we can plot the PAGES2k network on a map to understand the spatial distribution of the temperature records and the types of proxies that were measured.\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Before generating the plot, we have to define the colours and the marker types that we want to use in the plot. We also need to set a list with the different `archive_type` names that appear in the data frame.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {}, "executionInfo": { "elapsed": 5407, "status": "ok", "timestamp": 1680670611982, "user": { "displayName": "Brodie Pearson", "userId": "05269028596972519847" }, "user_tz": 420 } }, "outputs": [], "source": [ "# list of markers and colors for the different archive_type\n", "markers = [\"p\", \"p\", \"o\", \"v\", \"d\", \"*\", \"s\", \"s\", \"8\", \"D\", \"^\"]\n", "colors = [\n", " np.array([1.0, 0.83984375, 0.0]),\n", " np.array([0.73828125, 0.71484375, 0.41796875]),\n", " np.array([1.0, 0.546875, 0.0]),\n", " np.array([0.41015625, 0.41015625, 0.41015625]),\n", " np.array([0.52734375, 0.8046875, 0.97916667]),\n", " np.array([0.0, 0.74609375, 1.0]),\n", " np.array([0.25390625, 0.41015625, 0.87890625]),\n", " np.array([0.54296875, 0.26953125, 0.07421875]),\n", " np.array([1, 0, 0]),\n", " np.array([1.0, 0.078125, 0.57421875]),\n", " np.array([0.1953125, 0.80078125, 0.1953125]),\n", "]\n", "\n", "# create the plot\n", "\n", "fig = plt.figure(figsize=(10, 5))\n", "ax = fig.add_subplot(1, 1, 1, projection=ccrs.Robinson())\n", "\n", "# add plot title\n", "ax.set_title(\n", " f\"PAGES2k Network (n={len(pages2k_data)})\", fontsize=20, fontweight=\"bold\")\n", "\n", "# set the base map\n", "# ----------------\n", "ax.set_global()\n", "\n", "# add coast lines\n", "ax.coastlines()\n", "\n", "# add land fratures using gray color\n", "ax.add_feature(cfeature.LAND, facecolor=\"gray\", alpha=0.3)\n", "\n", "# add gridlines for latitude and longitude\n", "ax.gridlines(edgecolor=\"gray\", linestyle=\":\")\n", "\n", "\n", "# plot the different archive types\n", "# -------------------------------\n", "\n", "# extract the name of the different archive types\n", "archive_types = pages2k_data.archiveType.unique()\n", "\n", "# plot the archive_type using a forloop\n", "for i, type_i in enumerate(archive_types):\n", " df = pages2k_data[pages2k_data[\"archiveType\"] == type_i]\n", " # count the number of appearances of the same archive_type\n", " count = df[\"archiveType\"].count()\n", " # generate the plot\n", " ax.scatter(\n", " df[\"geo_meanLon\"],\n", " df[\"geo_meanLat\"],\n", " marker=markers[i],\n", " c=colors[i],\n", " edgecolor=\"k\",\n", " s=50,\n", " transform=ccrs.Geodetic(),\n", " label=f\"{type_i} (n = {count})\",\n", " )\n", "# add legend to the plot\n", "ax.legend(\n", " scatterpoints=1,\n", " bbox_to_anchor=(0, -0.4),\n", " loc=\"lower left\",\n", " ncol=3,\n", " fontsize=15,\n", ")\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Questions 3\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "You have just plotted the global distribution and temperature proxy type of the 692 records in the PAGES2k network!\n", "\n", "1. Which temperature proxy is the most and least abundant in this database?\n", "2. In what region do you observe the most and least temperature records? Why might this be the case?\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Summary\n", "\n", "In this tutorial, you explored the PAGES2K network, which offers an extensive collection of proxy temperature reconstructions spanning the last 2,000 years. You surveyed various types of paleoclimate proxies and archives available, in addition to crafting a global map pinpointing the locations of the PAGES2k proxy records. As you advance throughout this module, you will extract and meticulously analyze the temperature timelines embedded within reconstructions such as those shown here.\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {}, "tags": [] }, "source": [ "# Resources\n", "\n", "Code for this tutorial is based on existing notebooks from LinkedEarth that [convert LiPD files to a Pandas dataframe](https://github.com/LinkedEarth/notebooks/blob/master/PAGES2k/01.lipd2df.ipynb) and [create a map of the PAGES2k network](https:///github.com/LinkedEarth/notebooks/blob/master/PAGES2k/02.plot_map.ipynb).\n", "\n", "The following data is used in this tutorial:\n", "\n", "- PAGES2k Consortium. A global multiproxy database for temperature reconstructions of the Common Era. Sci Data 4, 170088 (2017). https://doi.org/10.1038/sdata.2017.88\n" ] } ], "metadata": { "colab": { "collapsed_sections": [], "include_colab_link": true, "name": "W1D4_Tutorial1", "provenance": [ { "file_id": "1lHuVrVtAW4fQzc0dFdlZuwY0i71KWw_t", "timestamp": 1677637469824 } ], "toc_visible": true }, "gpuClass": "standard", "kernel": { "display_name": "Python 3", "language": "python", "name": "python3" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" } }, "nbformat": 4, "nbformat_minor": 4 }