{
"cells": [
{
"cell_type": "markdown",
"id": "a0e97051",
"metadata": {
"colab_type": "text",
"execution": {},
"id": "view-in-github"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"id": "ranging-burst",
"metadata": {
"execution": {}
},
"source": [
"# Example Deep Learning Project\n",
"\n",
"**By Neuromatch Academy**\n",
"\n",
"__Content creators:__ Marius 't Hart, Megan Peters, Vladimir Haltakov, Paul Schrater, Gunnar Blohm\n",
"\n",
"__Production editor:__ Spiros Chavlis"
]
},
{
"cell_type": "markdown",
"id": "lfthSM088QdJ",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Objectives\n",
"\n",
"We're interested in automatically classifying movement. There is a great dataset (MoVi) with different modalities of movement recordings (videos, visual markers, accelerometers, skeletal motion reconstructions, etc). We will use a sub-set of this data, i.e. estimated skeletal motion, to perform a pilot study investigating whether we can classify different movements from the skeletal motion. And if so, which skeletal motions (if not all) are neccessary for good decoding performance?\n",
"\n",
"Please check out the different resources below to better understand the MoVi dataset and learn more about the movements.\n",
"\n",
"**Resources**:\n",
"* [see MoVi paper here](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0253157)\n",
"* [GitHub page of MoVi](https://github.com/saeed1262/MoVi-Toolbox)\n",
"* [MoVi website and description](https://www.biomotionlab.ca/movi/)\n",
"* [full MoVi dataset (not needed for this demo)](https://dataverse.scholarsportal.info/dataset.xhtml?persistentId=doi:10.5683/SP2/JRHDRN)"
]
},
{
"cell_type": "markdown",
"id": "YLvELJIDJiW8",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Setup\n",
"\n",
"For your own project, you can put together a colab notebook by copy-pasting bits of code from the tutorials. We still recommend keeping the 4 setup cells at the top, like here; Imports, Figure Settings, Plotting functions, and Data retrieval."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bPkNdonuJpIL",
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"# Imports\n",
"# get some matrices and plotting:\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# get some pytorch:\n",
"import torch\n",
"import torch.nn as nn\n",
"from torch.nn import MaxPool1d\n",
"from torch.utils.data import Dataset\n",
"from torch.utils.data import DataLoader\n",
"\n",
"# confusion matrix from sklearn\n",
"from sklearn.metrics import confusion_matrix\n",
"\n",
"# to get some idea of how long stuff will take to complete:\n",
"import time\n",
"\n",
"# to see how unbalanced the data is:\n",
"from collections import Counter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Figure settings\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Figure settings\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Figure settings\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Figure settings\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###### Figure settings\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "Fn81OPRAJxS1",
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Figure settings\n",
"import ipywidgets as widgets #interactive display\n",
"\n",
"%config InlineBackend.figure_format = 'retina'\n",
"plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/content-creation/main/nma.mplstyle\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting functions\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plotting functions\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Plotting functions\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Plotting functions\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###### Plotting functions\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ieSP3d3Z8dOl",
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Plotting functions\n",
"\n",
"def plotConfusionMatrix(real_labels, predicted_labels, label_names):\n",
"\n",
" # conver the labels to integers:\n",
" real_labels = [int(x) for x in real_labels]\n",
" predicted_labels = [int(x) for x in predicted_labels]\n",
" tick_names = [a.replace(\"_\", \" \") for a in label_names]\n",
"\n",
" cm = confusion_matrix(real_labels, predicted_labels, normalize='true')\n",
"\n",
" fig = plt.figure(figsize=(8,6))\n",
" plt.imshow(cm)\n",
" plt.xticks(range(len(tick_names)),tick_names, rotation=90)\n",
" plt.yticks(range(len(tick_names)),tick_names)\n",
" plt.xlabel('predicted move')\n",
" plt.ylabel('real move')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data retrieval\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Run this cell to download the data for this example project.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data retrieval\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Run this cell to download the data for this example project.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Data retrieval\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Run this cell to download the data for this example project.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Data retrieval\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Run this cell to download the data for this example project.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###### Data retrieval\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Run this cell to download the data for this example project.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "vN9AoEDiJ7jp",
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Data retrieval\n",
"# @markdown Run this cell to download the data for this example project.\n",
"import io\n",
"import requests\n",
"r = requests.get('https://osf.io/mnqb7/download')\n",
"if r.status_code != 200:\n",
" print('Failed to download data')\n",
"else:\n",
" train_moves=np.load(io.BytesIO(r.content), allow_pickle=True)['train_moves']\n",
" train_labels=np.load(io.BytesIO(r.content), allow_pickle=True)['train_labels']\n",
" test_moves=np.load(io.BytesIO(r.content), allow_pickle=True)['test_moves']\n",
" test_labels=np.load(io.BytesIO(r.content), allow_pickle=True)['test_labels']\n",
" label_names=np.load(io.BytesIO(r.content), allow_pickle=True)['label_names']\n",
" joint_names=np.load(io.BytesIO(r.content), allow_pickle=True)['joint_names']"
]
},
{
"cell_type": "markdown",
"id": "auburn-demonstration",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Step 1: Question\n",
"There are many different questions we could ask with the MoVi dataset. We will start with a simple question: **\"Can we classify movements from skeletal motion data, and if so, which body parts are the most informative ones?\"**\n",
"\n",
"Our goal is to perform a *pilot* study to see if this is possible in principle. We will therefore use \"ground truth\" skeletal motion data that has been computed using an inference algorithm (see [MoVi paper](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0253157)). If this works out, then as a next step we might want to use the raw sensor data or even videos...\n",
"\n",
"The ultimate goal could for example be to figure out which body parts to record movements from (e.g. is just a wristband enough?) to classify movement."
]
},
{
"cell_type": "markdown",
"id": "meaningful-oracle",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Step 2: literature review\n",
"Most importantly, our literature review needs to address the following:\n",
"* what modeling approaches make it possible to classify time series data?\n",
"* how is human motion captured?\n",
"* what exactly is in the MoVi dataset?\n",
"* what is known regarding classification of human movement based on different measurements?\n",
"\n",
"What we learn from the literature review is too long to write out here... But we would like to point out that human motion classification has been done; we're not proposing a very novel project here. But that's ok for an NMA project!"
]
},
{
"cell_type": "markdown",
"id": "capable-retirement",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Step 3: ingredients\n",
"\n",
"## Data ingredients\n",
"\n",
"After downloading the data, we should have 6 numpy arrays:\n",
"\n",
"- `train_moves`: the training set of 1032 movements\n",
"- `train_labels`: the class labels for each of the 1032 training movements\n",
"- `test_moves`: the test set of 172 movements\n",
"- `test_labels`: the class labels for each of the 172 test movements\n",
"- `label_names`: text labels for the values in the two arrays of class labels\n",
"- `joint_names`: the names of the 24 joints used in each movement\n",
"\n",
"We'll take a closer look at the data below. *Note*: data is split into training and test sets. If you don't know what that means, NMA-DL will teach you!\n",
"\n",
"**Inputs**:\n",
"\n",
"For simplicity, we take the first 24 joints of the whole MoVi dataset including all major limbs. The data was in an exponential map format, which has 3 rotations/angles for each joint (pitch, yaw, roll). The advantage of this type of data is that it is (mostly) agnostic about body size or shape. And since we care about movements only, we choose this representation of the data (there are other representations in the full data set).\n",
"\n",
"Since the joints are simply points, the 3rd angle (i.e. roll) contained no information, and that is already dropped from the data that we pre-formatted for this demo project. That is, the movements of each joint are described by 2 angles, that change over time. Furthermore, we normalized all the angles/rotations to fall between 0 and 1 so they are good input for PyTorch.\n",
"\n",
"Finally, the movements originally took various amounts of time, but we need the same input for each movement, so we sub-sampled and (linearly) interpolated the data to have 75 timepoints.\n",
"\n",
"Our training data is supposed to have 1032 movements, 2 x 24 joints = 48 channels and 75 timepoints. Let's check and make sure:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "mV3UL0-fNFsq",
"metadata": {
"execution": {}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(1032, 48, 75)\n"
]
}
],
"source": [
"print(train_moves.shape)"
]
},
{
"cell_type": "markdown",
"id": "aGNWHN4gT8qW",
"metadata": {
"execution": {}
},
"source": [
"Cool!\n",
"\n",
"**Joints**:\n",
"\n",
"For each movement we have 2 angles from 24 joints. Which joints are these?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "XseMCJ_JUGpv",
"metadata": {
"execution": {}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0: Pelvis\n",
"1: LeftHip\n",
"2: RightHip\n",
"3: Spine1\n",
"4: LeftKnee\n",
"5: RightKnee\n",
"6: Spine2\n",
"7: LeftAnkle\n",
"8: RightAnkle\n",
"9: Spine3\n",
"10: LeftFoot\n",
"11: RightFoot\n",
"12: Neck\n",
"13: LeftCollar\n",
"14: RightCollar\n",
"15: Head\n",
"16: LeftShoulder\n",
"17: RightShoulder\n",
"18: LeftElbow\n",
"19: RightElbow\n",
"20: LeftWrist\n",
"21: RightWrist\n",
"22: LeftHand\n",
"23: RightHand\n"
]
}
],
"source": [
"for joint_no in range(24):\n",
" print(f\"{joint_no}: {joint_names[joint_no]}\")"
]
},
{
"cell_type": "markdown",
"id": "73fNoLe3Nyui",
"metadata": {
"execution": {}
},
"source": [
"**Labels**:\n",
"\n",
"Let's have a look at the `train_labels` array too:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "wbfDO9YcN8IK",
"metadata": {
"execution": {}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 0 1 4 ... 6 2 11]\n",
"(1032,)\n"
]
}
],
"source": [
"print(train_labels)\n",
"print(train_labels.shape)"
]
},
{
"cell_type": "markdown",
"id": "PuIGf6WdNHAn",
"metadata": {
"execution": {}
},
"source": [
"The labels are numbers, and there are 1032 of them, so that matches the number of movements in the data set. There are text versions too in the array called `label_names`. Let's have a look. There are supposed to be 14 movement classes.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1z9OibQROznT",
"metadata": {
"execution": {}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13]\n",
"0: crawling\n",
"1: throw/catch\n",
"2: walking\n",
"3: running_in_spot\n",
"4: cross_legged_sitting\n",
"5: hand_clapping\n",
"6: scratching_head\n",
"7: kicking\n",
"8: phone_talking\n",
"9: sitting_down\n",
"10: checking_watch\n",
"11: pointing\n",
"12: hand_waving\n",
"13: taking_photo\n"
]
}
],
"source": [
"# let's check the values of the train_labels array:\n",
"label_numbers = np.unique(train_labels)\n",
"print(label_numbers)\n",
"\n",
"# and use them as indices into the label_names array:\n",
"for label_no in label_numbers:\n",
" print(f\"{label_no}: {label_names[label_no]}\")"
]
},
{
"cell_type": "markdown",
"id": "z3fCrClWP85Z",
"metadata": {
"execution": {}
},
"source": [
"The test data set has similar data, but fewer movements. That's ok. What's important is that both the training and test datasets have an even spread of movement types, i.e. we want them to be balanced. Let's see how balanced the data is:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "barhwy9EQ5uI",
"metadata": {
"execution": {}
},
"outputs": [
{
"data": {
"text/plain": [
"Counter({0: 74,\n",
" 1: 74,\n",
" 4: 73,\n",
" 5: 73,\n",
" 6: 74,\n",
" 7: 74,\n",
" 8: 74,\n",
" 9: 74,\n",
" 10: 74,\n",
" 11: 74,\n",
" 12: 74,\n",
" 13: 74,\n",
" 3: 73,\n",
" 2: 73})"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Counter(train_labels)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "CCJystGFRO3H",
"metadata": {
"execution": {}
},
"outputs": [
{
"data": {
"text/plain": [
"Counter({2: 13,\n",
" 3: 13,\n",
" 5: 13,\n",
" 4: 13,\n",
" 6: 12,\n",
" 7: 12,\n",
" 8: 12,\n",
" 9: 12,\n",
" 11: 12,\n",
" 10: 12,\n",
" 12: 12,\n",
" 13: 12,\n",
" 1: 12,\n",
" 0: 12})"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Counter(test_labels)"
]
},
{
"cell_type": "markdown",
"id": "uGHra3KuRlBh",
"metadata": {
"execution": {}
},
"source": [
"So that looks more or less OK. Movements 2, 3, 4 and 5 occur once more in the training data than the other movements, and one time fewer in the test data. Not perfect, but probably doesn't matter that much."
]
},
{
"cell_type": "markdown",
"id": "rHpiIl1eOupO",
"metadata": {
"execution": {}
},
"source": [
"## Model ingredients\n",
"\n",
"**\"Mechanisms\"**:\n",
"\n",
"* Feature engineering? --> Do we need anything else aside from angular time courses? For now we choose to only use the angular time courses (exponential maps), as our ultimate goal is to see how many joints we need for accurate movement classification so that we can decrease the number of measurements or devices for later work.\n",
"\n",
"* Feature selection? --> Which joint movements are most informative? These are related to our research questions and hypotheses, so this project will explicitly investigate which joints are most informative.\n",
"\n",
"* Feature grouping? --> Instead of trying all possible combinations of joints (very many) we could focus on limbs, by grouping joints. We could also try the model on individual joints.\n",
"\n",
"* Classifier? --> For our classifier we would like to keep it as simple as possible, but we will decide later.\n",
"\n",
"* Input? --> The training data (movements and labels) will be used to train the classifier.\n",
"\n",
"* Output? --> The test data will be used as input for the trained model and we will see if the predicted labels are the same as the actual labels."
]
},
{
"cell_type": "markdown",
"id": "single-server",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Step 4: hypotheses\n",
"Since humans can easily distinguish different movement types from video data and also more abstract \"stick figures\", a DL model should also be able to do so. Therefore, our hypotheses are more detailed with respect to parameters influencing model performance (and not just whether it will work or not).\n",
"\n",
"Remember, we're interested in seeing how many joints are needed for classification. So we could hypothezise (Hypothesis 1) that arm and leg motions are sufficient for classification (meaning: head and torso data is not needed).\n",
"\n",
"* Hypothesis 1: The performance of a model with four limbs plus torso and head is not higher than the performance of a model with only limbs.\n",
"\n",
"We could also hypothesize that data from only one side of the body is sufficient (Hypothesis 2), e.g. the right side, since our participants are right handed.\n",
"\n",
"* Hypothesis 2: A model using only joints in the right arm will outperform a model using only the joints in the left arm.\n",
"\n",
"Writing those in mathematical terms:\n",
"* Hypothesis 1: $\\mathbb{E}(perf_{limbs})>\\mathbb{E}(perf_{torso})$\n",
"* Hypothesis 2: $\\mathbb{E}(perf_{right arm})>\\mathbb{E}(perf_{left arm})$"
]
},
{
"cell_type": "markdown",
"id": "fantastic-egypt",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Step 5: toolkit selection\n",
"We need a toolkit that can deal with time-varying data as input (e.g. 1d convnet, LSTM, transformer...). We want to keep it as simple as possible to start with. So let's run with a 1d convnet. It allows us to answer our question, it will be able to speak to our hypotheses, and hopefully we can achieve our goal to see if automatic movement classification based on (sparse) body movement data is possible."
]
},
{
"cell_type": "markdown",
"id": "parental-compensation",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Step 6: model drafting\n",
"Here is our sketch of the model we wanted to build...\n",
"\n",
"