{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {},
"id": "view-in-github"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Tutorial 1: Geometric view of data\n",
"\n",
"**Week 1, Day 4: Dimensionality Reduction**\n",
"\n",
"**By Neuromatch Academy**\n",
"\n",
"__Content creators:__ Alex Cayco Gajic, John Murray\n",
"\n",
"__Content reviewers:__ Roozbeh Farhoudi, Matt Krause, Spiros Chavlis, Richard Gao, Michael Waskom, Siddharth Suresh, Natalie Schaworonkow, Ella Batty\n",
"\n",
"**Production editors:** Spiros Chavlis"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Tutorial Objectives\n",
"\n",
"*Estimated timing of tutorial: 50 minutes*\n",
"\n",
"In this notebook we'll explore how multivariate data can be represented in different orthonormal bases. This will help us build intuition that will be helpful in understanding PCA in the following tutorial.\n",
"\n",
"Overview:\n",
" - Generate correlated multivariate data.\n",
" - Define an arbitrary orthonormal basis.\n",
" - Project the data onto the new basis."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @markdown\n",
"from IPython.display import IFrame\n",
"from ipywidgets import widgets\n",
"out = widgets.Output()\n",
"with out:\n",
" print(f\"If you want to download the slides: https://osf.io/download/kaq2x/\")\n",
" display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/kaq2x/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n",
"display(out)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Setup"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"**⚠ Experimental LLM-enhanced tutorial ⚠**\n",
"\n",
"This notebook includes Neuromatch's experimental [Chatify](https://github.com/ContextLab/chatify) 🤖 functionality. The Chatify notebook extension adds support for a large language model-based \"coding tutor\" to the materials. The tutor provides automatically generated text to help explain any code cell in this notebook.\n",
"\n",
"Note that using Chatify may cause breaking changes and/or provide incorrect or misleading information. If you wish to proceed by installing and enabling the Chatify extension, you should run the next two code blocks (hidden by default). If you do *not* want to use this experimental version of the Neuromatch materials, please use the [stable](https://compneuro.neuromatch.io/tutorials/intro.html) materials instead.\n",
"\n",
"To use the Chatify helper, insert the `%%explain` magic command at the start of any code cell and then run it (shift + enter) to access an interface for receiving LLM-based assitance. You can then select different options from the dropdown menus depending on what sort of assitance you want. To disable Chatify and run the code block as usual, simply delete the `%%explain` command and re-run the cell.\n",
"\n",
"Note that, by default, all of Chatify's responses are generated locally. This often takes several minutes per response. Once you click the \"Submit request\" button, just be patient-- stuff is happening even if you can't see it right away!\n",
"\n",
"Thanks for giving Chatify a try! Love it? Hate it? Either way, we'd love to hear from you about your Chatify experience! Please consider filling out our [brief survey](https://forms.gle/jNq85KVvNwj1JHZV9) to provide feedback and help us make Chatify more awesome!\n",
"\n",
"**Run the next two cells to install and configure Chatify...**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {}
},
"outputs": [],
"source": [
"%pip install -q davos\n",
"import davos\n",
"davos.config.suppress_stdout = True"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {}
},
"outputs": [],
"source": [
"smuggle chatify # pip: git+https://github.com/ContextLab/chatify.git\n",
"%load_ext chatify"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install and import feedback gadget\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Install and import feedback gadget\n",
"\n",
"!pip3 install vibecheck datatops --quiet\n",
"\n",
"from vibecheck import DatatopsContentReviewContainer\n",
"def content_review(notebook_section: str):\n",
" return DatatopsContentReviewContainer(\n",
" \"\", # No text prompt\n",
" notebook_section,\n",
" {\n",
" \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
" \"name\": \"neuromatch_cn\",\n",
" \"user_key\": \"y1x3mpx5\",\n",
" },\n",
" ).render()\n",
"\n",
"\n",
"feedback_prefix = \"W1D4_T1\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"# Imports\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Figure Settings\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Figure Settings\n",
"import logging\n",
"logging.getLogger('matplotlib.font_manager').disabled = True\n",
"\n",
"import ipywidgets as widgets # interactive display\n",
"%config InlineBackend.figure_format = 'retina'\n",
"plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting Functions\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Plotting Functions\n",
"\n",
"def plot_data(X):\n",
" \"\"\"\n",
" Plots bivariate data. Includes a plot of each random variable, and a scatter\n",
" plot of their joint activity. The title indicates the sample correlation\n",
" calculated from the data.\n",
"\n",
" Args:\n",
" X (numpy array of floats) : Data matrix each column corresponds to a\n",
" different random variable\n",
"\n",
" Returns:\n",
" Nothing.\n",
" \"\"\"\n",
"\n",
" fig = plt.figure(figsize=[8, 4])\n",
" gs = fig.add_gridspec(2, 2)\n",
" ax1 = fig.add_subplot(gs[0, 0])\n",
" ax1.plot(X[:, 0], color='k')\n",
" plt.ylabel('Neuron 1')\n",
" plt.title('Sample var 1: {:.1f}'.format(np.var(X[:, 0])))\n",
" ax1.set_xticklabels([])\n",
" ax2 = fig.add_subplot(gs[1, 0])\n",
" ax2.plot(X[:, 1], color='k')\n",
" plt.xlabel('Sample Number')\n",
" plt.ylabel('Neuron 2')\n",
" plt.title('Sample var 2: {:.1f}'.format(np.var(X[:, 1])))\n",
" ax3 = fig.add_subplot(gs[:, 1])\n",
" ax3.plot(X[:, 0], X[:, 1], '.', markerfacecolor=[.5, .5, .5],\n",
" markeredgewidth=0)\n",
" ax3.axis('equal')\n",
" plt.xlabel('Neuron 1 activity')\n",
" plt.ylabel('Neuron 2 activity')\n",
" plt.title('Sample corr: {:.1f}'.format(np.corrcoef(X[:, 0], X[:, 1])[0, 1]))\n",
" plt.show()\n",
"\n",
"\n",
"def plot_basis_vectors(X, W):\n",
" \"\"\"\n",
" Plots bivariate data as well as new basis vectors.\n",
"\n",
" Args:\n",
" X (numpy array of floats) : Data matrix each column corresponds to a\n",
" different random variable\n",
" W (numpy array of floats) : Square matrix representing new orthonormal\n",
" basis each column represents a basis vector\n",
"\n",
" Returns:\n",
" Nothing.\n",
" \"\"\"\n",
"\n",
" plt.figure(figsize=[4, 4])\n",
" plt.plot(X[:, 0], X[:, 1], '.', color=[.5, .5, .5], label='Data')\n",
" plt.axis('equal')\n",
" plt.xlabel('Neuron 1 activity')\n",
" plt.ylabel('Neuron 2 activity')\n",
" plt.plot([0, W[0, 0]], [0, W[1, 0]], color='r', linewidth=3,\n",
" label='Basis vector 1')\n",
" plt.plot([0, W[0, 1]], [0, W[1, 1]], color='b', linewidth=3,\n",
" label='Basis vector 2')\n",
" plt.legend()\n",
" plt.show()\n",
"\n",
"\n",
"def plot_data_new_basis(Y):\n",
" \"\"\"\n",
" Plots bivariate data after transformation to new bases.\n",
" Similar to plot_data but with colors corresponding to projections onto\n",
" basis 1 (red) and basis 2 (blue). The title indicates the sample correlation\n",
" calculated from the data.\n",
"\n",
" Note that samples are re-sorted in ascending order for the first\n",
" random variable.\n",
"\n",
" Args:\n",
" Y (numpy array of floats): Data matrix in new basis each column\n",
" corresponds to a different random variable\n",
"\n",
" Returns:\n",
" Nothing.\n",
" \"\"\"\n",
" fig = plt.figure(figsize=[8, 4])\n",
" gs = fig.add_gridspec(2, 2)\n",
" ax1 = fig.add_subplot(gs[0, 0])\n",
" ax1.plot(Y[:, 0], 'r')\n",
" plt.xlabel\n",
" plt.ylabel('Projection \\n basis vector 1')\n",
" plt.title('Sample var 1: {:.1f}'.format(np.var(Y[:, 0])))\n",
" ax1.set_xticklabels([])\n",
" ax2 = fig.add_subplot(gs[1, 0])\n",
" ax2.plot(Y[:, 1], 'b')\n",
" plt.xlabel('Sample number')\n",
" plt.ylabel('Projection \\n basis vector 2')\n",
" plt.title('Sample var 2: {:.1f}'.format(np.var(Y[:, 1])))\n",
" ax3 = fig.add_subplot(gs[:, 1])\n",
" ax3.plot(Y[:, 0], Y[:, 1], '.', color=[.5, .5, .5])\n",
" ax3.axis('equal')\n",
" plt.xlabel('Projection basis vector 1')\n",
" plt.ylabel('Projection basis vector 2')\n",
" plt.title('Sample corr: {:.1f}'.format(np.corrcoef(Y[:, 0], Y[:, 1])[0, 1]))\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 1: Geometric view of data\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 1: Geometric view of data\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'THu9yHnpq9I'), ('Bilibili', 'BV1Af4y1R78w')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Geometric_view_of_data_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 1: Generate correlated multivariate data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 2: Multivariate data\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 2: Multivariate data\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'jcTq2PgU5Vw'), ('Bilibili', 'BV1xz4y1D7ES')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"This video describes the covariance matrix and the multivariate normal distribution.\n",
"\n",
"\n",
" Click here for text recap of video
\n",
"\n",
"To gain intuition, we will first use a simple model to generate multivariate data. Specifically, we will draw random samples from a *bivariate normal distribution*. This is an extension of the one-dimensional normal distribution to two dimensions, in which each $x_i$ is marginally normal with mean $\\mu_i$ and variance $\\sigma_i^2$:\n",
"\n",
"\\begin{equation}\n",
"x_i \\sim \\mathcal{N}(\\mu_i,\\sigma_i^2).\n",
"\\end{equation}\n",
"\n",
"Additionally, the joint distribution for $x_1$ and $x_2$ has a specified correlation coefficient $\\rho$. Recall that the correlation coefficient is a normalized version of the covariance, and ranges between -1 and +1:\n",
"\n",
"\\begin{equation}\n",
"\\rho = \\frac{\\text{cov}(x_1, x_2)}{\\sqrt{\\sigma_1^2 \\sigma_2^2}}.\n",
"\\end{equation}\n",
"\n",
"For simplicity, we will assume that the mean of each variable has already been subtracted, so that $\\mu_i=0$ for both $i=1$ and $i=2$. The remaining parameters can be summarized in the covariance matrix, which for two dimensions has the following form:\n",
"\n",
"\\begin{equation}\n",
"{\\bf \\Sigma} =\n",
"\\begin{pmatrix}\n",
" \\text{var}(x_1) & \\text{cov}(x_1,x_2) \\\\\n",
" \\text{cov}(x_1,x_2) &\\text{var}(x_2)\n",
"\\end{pmatrix}.\n",
"\\end{equation}\n",
"\n",
"In general, $\\bf \\Sigma$ is a symmetric matrix with the variances $\\text{var}(x_i) = \\sigma_i^2$ on the diagonal, and the covariances on the off-diagonal. Later, we will see that the covariance matrix plays a key role in PCA.\n",
"\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Multivariate_data_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Coding Exercise 1: Draw samples from a distribution\n",
"\n",
"We have provided code to draw random samples from a zero-mean bivariate normal distribution with a specified covariance matrix (`get_data`). Throughout this tutorial, we'll imagine these samples represent the activity (firing rates) of two recorded neurons on different trials. Fill in the function below to calculate the covariance matrix given the desired variances and correlation coefficient. The covariance can be found by rearranging the equation above:\n",
"\n",
"\\begin{equation}\n",
"\\text{cov}(x_1,x_2) = \\rho \\sqrt{\\sigma_1^2 \\sigma_2^2}.\n",
"\\end{equation}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Execute this cell to get helper function `get_data`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @markdown Execute this cell to get helper function `get_data`\n",
"\n",
"def get_data(cov_matrix):\n",
" \"\"\"\n",
" Returns a matrix of 1000 samples from a bivariate, zero-mean Gaussian.\n",
"\n",
" Note that samples are sorted in ascending order for the first random variable\n",
"\n",
" Args:\n",
" cov_matrix (numpy array of floats): desired covariance matrix\n",
"\n",
" Returns:\n",
" (numpy array of floats) : samples from the bivariate Gaussian, with each\n",
" column corresponding to a different random\n",
" variable\n",
" \"\"\"\n",
"\n",
" mean = np.array([0, 0])\n",
" X = np.random.multivariate_normal(mean, cov_matrix, size=1000)\n",
" indices_for_sorting = np.argsort(X[:, 0])\n",
" X = X[indices_for_sorting, :]\n",
"\n",
" return X\n",
"\n",
"help(get_data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"def calculate_cov_matrix(var_1, var_2, corr_coef):\n",
" \"\"\"\n",
" Calculates the covariance matrix based on the variances and correlation\n",
" coefficient.\n",
"\n",
" Args:\n",
" var_1 (scalar) : variance of the first random variable\n",
" var_2 (scalar) : variance of the second random variable\n",
" corr_coef (scalar) : correlation coefficient\n",
"\n",
" Returns:\n",
" (numpy array of floats) : covariance matrix\n",
" \"\"\"\n",
"\n",
" #################################################\n",
" ## TODO for students: calculate the covariance matrix\n",
" # Fill out function and remove\n",
" raise NotImplementedError(\"Student exercise: calculate the covariance matrix!\")\n",
" #################################################\n",
"\n",
" # Calculate the covariance from the variances and correlation\n",
" cov = ...\n",
"\n",
" cov_matrix = np.array([[var_1, cov], [cov, var_2]])\n",
"\n",
" return cov_matrix\n",
"\n",
"\n",
"# Set parameters\n",
"np.random.seed(2020) # set random seed\n",
"variance_1 = 1\n",
"variance_2 = 1\n",
"corr_coef = 0.8\n",
"\n",
"# Compute covariance matrix\n",
"cov_matrix = calculate_cov_matrix(variance_1, variance_2, corr_coef)\n",
"\n",
"# Generate data with this covariance matrix\n",
"X = get_data(cov_matrix)\n",
"\n",
"# Visualize\n",
"plot_data(X)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials/W1D4_DimensionalityReduction/solutions/W1D4_Tutorial1_Solution_85104841.py)\n",
"\n",
"*Example output:*\n",
"\n",
"
\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Draw_samples_from_a_distribution_Exercise\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Interactive Demo 1: Correlation effect on data\n",
"\n",
"We'll use the function you just completed but now we can change the correlation coefficient via slider. You should get a feel for how changing the correlation coefficient affects the geometry of the simulated data.\n",
"\n",
"1. What effect do negative correlation coefficient values have?\n",
"2. What correlation coefficient results in a circular data cloud?\n",
"\n",
"\n",
"Note that we sort the samples according to neuron 1's firing rate, meaning the plot of neuron 1 firing rate over sample number looks clean and pretty unchanging when compared to neuron 2.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Execute this cell to enable widget\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @markdown Execute this cell to enable widget\n",
"\n",
"def _calculate_cov_matrix(var_1, var_2, corr_coef):\n",
"\n",
" # Calculate the covariance from the variances and correlation\n",
" cov = corr_coef * np.sqrt(var_1 * var_2)\n",
"\n",
" cov_matrix = np.array([[var_1, cov], [cov, var_2]])\n",
"\n",
" return cov_matrix\n",
"\n",
"\n",
"@widgets.interact(corr_coef = widgets.FloatSlider(value=.2, min=-1, max=1, step=0.1))\n",
"def visualize_correlated_data(corr_coef=0):\n",
" variance_1 = 1\n",
" variance_2 = 1\n",
"\n",
" # Compute covariance matrix\n",
" cov_matrix = _calculate_cov_matrix(variance_1, variance_2, corr_coef)\n",
"\n",
" # Generate data with this covariance matrix\n",
" X = get_data(cov_matrix)\n",
"\n",
" # Visualize\n",
" plot_data(X)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials/W1D4_DimensionalityReduction/solutions/W1D4_Tutorial1_Solution_5d14b461.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Correlation_effect_on_data_Interactive_Demo_and_Discussion\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 2: Define a new orthonormal basis\n",
"\n",
"*Estimated timing to here from start of tutorial: 20 min*\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 3: Orthonormal bases\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 3: Orthonormal bases\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'PC1RZELnrIg'), ('Bilibili', 'BV1wT4y1E71g')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Orthonormal_Bases_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"This video shows that data can be represented in many ways using different bases. It also explains how to check if your favorite basis is orthonormal.\n",
"\n",
"\n",
" Click here for text recap of video
\n",
"\n",
"Next, we will define a new orthonormal basis of vectors ${\\bf u} = [u_1,u_2]$ and ${\\bf w} = [w_1,w_2]$. As we learned in the video, two vectors are orthonormal if:\n",
"\n",
"1. They are orthogonal (i.e., their dot product is zero):\n",
"\n",
"\\begin{equation}\n",
"{\\bf u\\cdot w} = u_1 w_1 + u_2 w_2 = 0\n",
"\\end{equation}\n",
"\n",
"2. They have unit length:\n",
"\n",
"\\begin{equation}\n",
"||{\\bf u}|| = ||{\\bf w} || = 1\n",
"\\end{equation}\n",
"\n",
" \n",
"\n",
"In two dimensions, it is easy to make an arbitrary orthonormal basis. All we need is a random vector ${\\bf u}$, which we have normalized. If we now define the second basis vector to be ${\\bf w} = [-u_2,u_1]$, we can check that both conditions are satisfied:\n",
"\n",
"\\begin{equation}\n",
"{\\bf u\\cdot w} = - u_1 u_2 + u_2 u_1 = 0\n",
"\\end{equation}\n",
"\n",
"and\n",
"\n",
"\\begin{equation}\n",
"{|| {\\bf w} ||} = \\sqrt{(-u_2)^2 + u_1^2} = \\sqrt{u_1^2 + u_2^2} = 1,\n",
"\\end{equation}\n",
"\n",
"where we used the fact that ${\\bf u}$ is normalized. So, with an arbitrary input vector, we can define an orthonormal basis, which we will write in matrix by stacking the basis vectors horizontally:\n",
"\n",
"\\begin{equation}\n",
"{{\\bf W} } =\n",
"\\begin{pmatrix}\n",
" u_1 & w_1 \\\\\n",
" u_2 & w_2\n",
"\\end{pmatrix}.\n",
"\\end{equation}"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Coding Exercise 2: Find an orthonormal basis\n",
"\n",
"In this exercise you will fill in the function below to define an orthonormal basis, given a single arbitrary 2-dimensional vector as an input.\n",
"\n",
"**Steps**\n",
"* Modify the function `define_orthonormal_basis` to first normalize the first basis vector $\\bf u$.\n",
"* Then complete the function by finding a basis vector $\\bf w$ that is orthogonal to $\\bf u$.\n",
"* Test the function using initial basis vector ${\\bf u} = [3,1]$. Plot the resulting basis vectors on top of the data scatter plot using the function `plot_basis_vectors`. (For the data, use $\\sigma_1^2 =1$, $\\sigma_2^2 =1$, and $\\rho = .8$)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"def define_orthonormal_basis(u):\n",
" \"\"\"\n",
" Calculates an orthonormal basis given an arbitrary vector u.\n",
"\n",
" Args:\n",
" u (numpy array of floats) : arbitrary 2-dimensional vector used for new\n",
" basis\n",
"\n",
" Returns:\n",
" (numpy array of floats) : new orthonormal basis\n",
" columns correspond to basis vectors\n",
" \"\"\"\n",
"\n",
" #################################################\n",
" ## TODO for students: calculate the orthonormal basis\n",
" # Fill out function and remove\n",
" raise NotImplementedError(\"Student exercise: implement the orthonormal basis function\")\n",
" #################################################\n",
"\n",
" # Normalize vector u\n",
" u = ...\n",
"\n",
" # Calculate vector w that is orthogonal to w\n",
" w = ...\n",
"\n",
" # Put in matrix form\n",
" W = np.column_stack([u, w])\n",
"\n",
" return W\n",
"\n",
"\n",
"# Set up parameters\n",
"np.random.seed(2020) # set random seed\n",
"variance_1 = 1\n",
"variance_2 = 1\n",
"corr_coef = 0.8\n",
"u = np.array([3, 1])\n",
"\n",
"# Compute covariance matrix\n",
"cov_matrix = calculate_cov_matrix(variance_1, variance_2, corr_coef)\n",
"\n",
"# Generate data\n",
"X = get_data(cov_matrix)\n",
"\n",
"# Get orthonomal basis\n",
"W = define_orthonormal_basis(u)\n",
"\n",
"# Visualize\n",
"plot_basis_vectors(X, W)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials/W1D4_DimensionalityReduction/solutions/W1D4_Tutorial1_Solution_25e1d102.py)\n",
"\n",
"*Example output:*\n",
"\n",
"
\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Find_an_orthonormal_basis_Exercise\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 3: Project data onto new basis\n",
"\n",
"*Estimated timing to here from start of tutorial: 35 min*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 4: Change of basis\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 4: Change of basis\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'Mj6BRQPKKUc'), ('Bilibili', 'BV1LK411J7NQ')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Change_of_basis_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Finally, we will express our data in the new basis that we have just found. Since $\\bf W$ is orthonormal, we can project the data into our new basis using simple matrix multiplication :\n",
"\n",
"\\begin{equation}\n",
"{\\bf Y = X W}.\n",
"\\end{equation}\n",
"\n",
"We will explore the geometry of the transformed data $\\bf Y$ as we vary the choice of basis."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Coding Exercise 3: Change to orthonormal basis\n",
"In this exercise you will fill in the function below to change data to an orthonormal basis.\n",
"\n",
"**Steps**\n",
"* Complete the function `change_of_basis` to project the data onto the new basis.\n",
"* Plot the projected data using the function `plot_data_new_basis`.\n",
"* What happens to the correlation coefficient in the new basis? Does it increase or decrease?\n",
"* What happens to variance?\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"def change_of_basis(X, W):\n",
" \"\"\"\n",
" Projects data onto new basis W.\n",
"\n",
" Args:\n",
" X (numpy array of floats) : Data matrix each column corresponding to a\n",
" different random variable\n",
" W (numpy array of floats) : new orthonormal basis columns correspond to\n",
" basis vectors\n",
"\n",
" Returns:\n",
" (numpy array of floats) : Data matrix expressed in new basis\n",
" \"\"\"\n",
"\n",
" #################################################\n",
" ## TODO for students: project the data onto a new basis W\n",
" # Fill out function and remove\n",
" raise NotImplementedError(\"Student exercise: implement change of basis\")\n",
" #################################################\n",
"\n",
" # Project data onto new basis described by W\n",
" Y = ...\n",
"\n",
" return Y\n",
"\n",
"\n",
"# Project data to new basis\n",
"Y = change_of_basis(X, W)\n",
"\n",
"# Visualize\n",
"plot_data_new_basis(Y)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials/W1D4_DimensionalityReduction/solutions/W1D4_Tutorial1_Solution_80a5f41b.py)\n",
"\n",
"*Example output:*\n",
"\n",
"
\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Change_to_orthonormal_basis_Exercise\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Interactive Demo 3: Play with the basis vectors\n",
"To see what happens to the correlation as we change the basis vectors, run the cell below. The parameter $\\theta$ controls the angle of $\\bf u$ in degrees. Use the slider to rotate the basis vectors.\n",
"\n",
"\n",
"\n",
"1. What happens to the projected data as you rotate the basis?\n",
"2. How does the correlation coefficient change? How does the variance of the projection onto each basis vector change?\n",
"3. Are you able to find a basis in which the projected data is **uncorrelated**?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Make sure you execute this cell to enable the widget!\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @markdown Make sure you execute this cell to enable the widget!\n",
"\n",
"def refresh(theta=0):\n",
" u = np.array([1, np.tan(theta * np.pi / 180)])\n",
" W = define_orthonormal_basis(u)\n",
" Y = change_of_basis(X, W)\n",
" plot_basis_vectors(X, W)\n",
" plot_data_new_basis(Y)\n",
"\n",
"\n",
"_ = widgets.interact(refresh, theta=(0, 90, 5))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/NeuromatchAcademy/course-content/tree/main/tutorials/W1D4_DimensionalityReduction/solutions/W1D4_Tutorial1_Solution_ec08cb0b.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Play_with_basis_vectors_Interactive_Demo_and_Discussion\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Summary\n",
"\n",
"*Estimated timing of tutorial: 50 minutes*\n",
"\n",
"- In this tutorial, we learned that multivariate data can be visualized as a cloud of points in a high-dimensional vector space. The geometry of this cloud is shaped by the covariance matrix.\n",
"\n",
"- Multivariate data can be represented in a new orthonormal basis using the dot product. These new basis vectors correspond to specific mixtures of the original variables - for example, in neuroscience, they could represent different ratios of activation across a population of neurons.\n",
"\n",
"- The projected data (after transforming into the new basis) will generally have a different geometry from the original data. In particular, taking basis vectors that are aligned with the spread of cloud of points decorrelates the data.\n",
"\n",
"* These concepts - covariance, projections, and orthonormal bases - are key for understanding PCA, which will be our focus in the next tutorial."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Notation\n",
"\n",
"\\begin{align}\n",
"x_i &\\quad \\text{data point for dimension } i\\\\\n",
"\\mu_i &\\quad \\text{mean along dimension } i\\\\\n",
"\\sigma_i^2 &\\quad \\text{variance along dimension } i \\\\\n",
"\\bf u, \\bf w &\\quad \\text{orthonormal basis vectors}\\\\\n",
"\\rho &\\quad \\text{correlation coefficient}\\\\\n",
"\\bf \\Sigma &\\quad \\text{covariance matrix}\\\\\n",
"\\bf X &\\quad \\text{original data matrix}\\\\\n",
"\\bf W &\\quad \\text{projection matrix}\\\\\n",
"\\bf Y &\\quad \\text{transformed data}\\\\\n",
"\\end{align}"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"include_colab_link": true,
"name": "W1D4_Tutorial1",
"provenance": [],
"toc_visible": true
},
"kernel": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.17"
}
},
"nbformat": 4,
"nbformat_minor": 0
}