Open In Colab   Open in Kaggle

Tutorial 1: Deep Learning Thinking 2: Architectures and Multimodal DL thinking

Week 3, Day 2: DL Thinking 2

By Neuromatch Academy

Content creators: Konrad Kording, Lyle ungar

Content reviewers: Kelson Shilling-Scrivo

Content editors: Kelson Shilling-Scrivo

Production editors: Gagana B, Spiros Chavlis


Tutorial Objectives

In this tutorial, you will practice thinking like a deep learning practitioner and figure out how to design architectures for different scenarios.

By the end of this tutorial, you will be better able to:

  • Know how to proceed when low on data

  • Have a toolbox of what to do in non-standard situations

We will also continue to see how to get relevant information out of domain experts, arguably the central skill of DL and how to convert insights into domains into the logic of actual approaches.


Setup

⚠ Experimental LLM-enhanced tutorial ⚠

This notebook includes Neuromatch’s experimental Chatify 🤖 functionality. The Chatify notebook extension adds support for a large language model-based “coding tutor” to the materials. The tutor provides automatically generated text to help explain any code cell in this notebook.

Note that using Chatify may cause breaking changes and/or provide incorrect or misleading information. If you wish to proceed by installing and enabling the Chatify extension, you should run the next two code blocks (hidden by default). If you do not want to use this experimental version of the Neuromatch materials, please use the stable materials instead.

To use the Chatify helper, insert the %%explain magic command at the start of any code cell and then run it (shift + enter) to access an interface for receiving LLM-based assitance. You can then select different options from the dropdown menus depending on what sort of assitance you want. To disable Chatify and run the code block as usual, simply delete the %%explain command and re-run the cell.

Note that, by default, all of Chatify’s responses are generated locally. This often takes several minutes per response. Once you click the “Submit request” button, just be patient– stuff is happening even if you can’t see it right away!

Thanks for giving Chatify a try! Love it? Hate it? Either way, we’d love to hear from you about your Chatify experience! Please consider filling out our brief survey to provide feedback and help us make Chatify more awesome!

Run the next two cells to install and configure Chatify…

%pip install -q davos
import davos
davos.config.suppress_stdout = True
Note: you may need to restart the kernel to use updated packages.
smuggle chatify      # pip: git+https://github.com/ContextLab/chatify.git
%load_ext chatify
Using default configuration!
Downloading the 'cache' file.

Install and import feedback gadget

# @title Install and import feedback gadget

!pip3 install vibecheck datatops --quiet

from vibecheck import DatatopsContentReviewContainer
def content_review(notebook_section: str):
    return DatatopsContentReviewContainer(
        "",  # No text prompt
        notebook_section,
        {
            "url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
            "name": "neuromatch_dl",
            "user_key": "f379rz8y",
        },
    ).render()


feedback_prefix = "W3D2_T1"

Section 1: Intro Deep Learning Thinking 2

Time estimate: ~4 mins

Video 1: Intro to DL Thinking 2

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Intro_to_DL_Thinking_2_Video")

Like Deep Learning thinking 1 last week, this tutorial is a bit different from others - there will be no coding! Instead, you will watch a series of vignettes about various scenarios where you want to use a neural network. This tutorial will focus on various architectures and multimodal thinking.

Each section below will start with a vignette where either Lyle or Konrad is trying to figure out how to set up a neural network for a specific problem. Try to think of questions you want to ask them as you watch, then pay attention to what questions Lyle and Konrad are asking. Were they what you would have asked? How do their questions help quickly clarify the situation?


Section 2: Getting More Data

Time estimate: ~15 mins

Video 2: Getting More Data Vignette

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Getting_More_Vignette_Video")

Konrad wants to build a neural network that classifies images based on the objects contained within them. He needs more data to help him train an accurate network, but buying more images is costly. He needs a different solution.

Think! 1: Designing a strategy to get more data

Given everything you know, how would you design a strategy to get some more data (pairs of images and the label of the object they are of) for the image classification neural network that Konrad is training? Be specific & write down a procedure.

Please discuss as a group. If you get stuck, you can uncover the hints below one at a time. Please spend some time discussing before uncovering the next hint, though! You are being real deep learning scientists now, and the answers won’t be easy.

Click here for hint 1

Look at a few photos of dogs (use an image search engine). How are they similar? How are they different? What makes them all be dogs?

Click here for hint 2

We don’t need to obtain any new images in order to give more examples of each object to our neural network.

Click here for hint 3

Think about color, orientation, flipping, pixel noise, color noise, shearing, contrast, brightness and scaling

Click here for hint 4

Discuss where each of these ideas will break down. Can too much of a good thing be good?

Click here for solution

Instead of collecting new data, we can create multiple examples for the neural network of each of our existing images by changing things like flipping them horizontally, shifting them horizontally or vertically by some number of pixels, scaling them to be larger or smaller (and cropping), rotating them, and changing their contrast and brightness.

This is called data augmentation and is a very commonly used and is an important strategy for training neural networks.

Importantly, we need to be careful about the amount we change each image, and we still want them to be useful training images! Let’s say you have a photo of a dog, and you scale it to be 1000x bigger and crop the middle out. You’d have just an image of fur - this would not be very useful as a training example on how to classify dogs. So we want to change factors about the images but not so much that they are no longer recognizable as the original object.

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Getting_More_Data_Discussion")

Video 3: Getting More Data Wrap-up

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Getting_More_Data_WrapUp_Video")

Check out the paper mentioned in the above video:

  • Balestriero, R., Bottou, L., LeCun, Y. (2022). The Effects of Regularization and Data Augmentation are Class Dependent. arxiv: 2204.03632

(Bonus) Think!: Class-based strategies

Discuss how you may want to vary these strategies based on the class of the object/images.

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_ClassBased_strategies_Bonus_Discussion")

Section 3: Detecting Tumors - What to do if there still isn’t enough data

Time estimate: ~15 mins

Video 4: Detecting Tumors Vignette

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Detecting_Tumors_Vignette_Video")

Video 5: Detecting Tumors Set-up

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Detecting_Tumors_SetUp_Video")

Konrad works for a hospital and wants to train a neural network to detect tumors in brain scans automatically. This type of tumor is pretty rare, which is great for humanity but means we only have a few thousand training examples for our neural network. This isn’t enough.

Even with adding in images of other types of tumors, we don’t have enough data. We have a lot of images of other things in ImageNet, like cats and dogs, though! Maybe we can use that?

Think! 2: Designing a strategy for detecting tumors

Given everything you know, how would you design a strategy to be able to train an accurate tumor-detecting neural network? Be specific & write down a procedure.

Please discuss as a group. If you get stuck, you can uncover the hints below one at a time. Please spend some time discussing before uncovering the next hint, though! You are being real deep learning scientists now, and the answers won’t be easy.

Click here for hint 1

Data augmentation is always something to consider

Click here for hint 2

A human learning to detect tumors is not learning how to see from scratch just based on the tumor images.

Click here for hint 3

You could use another dataset to help. What properties should such a dataset have?

Click here for hint 4

Even though the images in ImageNet are not of tumors, natural images contain information on aspects of visual objects that are similar to tumors (that they’re coherent, locally smooth, etc)

If you train a neural network on ImageNet first so that it learns general vision and embeddings of images, what might you want to change when training on the tumor images dataset?

Click here for solution

Humans don’t learn to see when they learn a new classification task. We already have a trained visual system that is good at processing and learning embeddings for natural images.

We can replicate this in neural networks! First, we can train our neural network on ImageNet alone to do object classification. This gives us a neural network that has already learned how to process and embed images.

Then, we want to take this neural network and continue to train it on just the tumor classification dataset. We can chop off the existing final layer (that outputs the probabilities of all the ImageNet classes) and train a new one that outputs the probability of there being a tumor in the image.

We could keep all the weights in the convolutional layers fixed, not allowing them to change after the ImageNet training or fine-tune them. People take both strategies!

This whole process is called pre-training. We have pre-trained the neural network on ImageNet before training on our actual task, the detection of tumors.

We should mention here that there are many ways of doing this. Train the whole network after training on a first task. Train the top layers after training the bottom layers. Potentially first do the latter and then the former. Pre-training can be done in many ways - looking for it as an opportunity is important.

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Detecting_Tumors_Discussion")

Video 6: Detecting Tumors Wrap-up

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Detecting_Tumors_WrapUp_Video")

Check out the paper mentioned in the above video:

  • Tschandl, P., Rinner, C., Apalla, Z. et al. (2020). Human–computer collaboration for skin cancer recognition. Nat Med 26: 1229–1234. doi: 10.1038/s41591-020-0942-0


Section 4: Brains on Forrest Gump

Time estimate: ~17 mins

Video 7: Brains on Forrest Gump Vignette

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Brains_on_Forrest_Gump_Vignette_Video")

Video 8: Brains on Forrest Gump Set-up

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Brains_on_Forrest_Gump_SetUp_Video")

Konrad has a great dataset - he has someone watching all of the movie Forrest Gump and MRI data (brain imaging) over the whole time the person is watching the movie. So, basically, he has the video stream over time and the brain data over time. He wants to figure out what those two data streams have in common. In other words, he wants to pull the shared information from two data modalities.

Think! 3: Designing a strategy for pulling shared info about brain data and Forrest Gump

Given everything you know, how would you design a strategy to get a shared embedding for the brain and video data? Be specific & write down a procedure.

Please discuss as a group. If you get stuck, you can uncover the hints below one at a time. Please spend some time discussing before uncovering the next hint though! You are being real deep learning scientists now and the answers won’t be easy

Click here for hint 1

We want the two datasets to share something. What does that mean?

Click here for hint 2

Where could the vectors \(\bar{X}_1\) and \(\bar{X}_2\) come from? How could they relate to the brain data and video data?

Click here for hint 3

You may want to use more than one neural network!

Click here for hint 4

What do we want our neural network solution to do here? Is there anything you want it to maximize or minimize?

Click here for hint 5

What happens if we multiply all activities by 2? We need a scale invariant solution.

Click here for solution

The first thing to note is that we want two embeddings, one for the brain data and a second for the video data.

The second thing to note is that we want these embeddings to capture shared information between the two.

The key is to realize that if both embeddings contain the same information, they should be correlated.

Looking at the formula for Pearson correlation:


(122)\[\begin{equation} \rho = \frac{\text{cov}(X_1, X_2)}{\text{var}(X_1) \cdot \text{var}(X_2)} \end{equation}\]

Where \(X_1\) and \(X_2\) are our two embeddings, to find the correlation between our two embeddings, we take their covariance and normalize it by their combined variance, giving us our scale invariant quantity to optimize.

Imagine the extreme case where there was no noise, and both embeddings extracted the same information. Both embeddings would be perfectly correlated with each other. Conversely, if the two embeddings had no shared information, there would be little to no correlation between them. Therefore, by maximizing the correlation between the two embedding spaces, we’re maximizing the shared information between the two embeddings.

Another way to think about it is that by maximizing the correlation, we’re attempting to have one common embedding between brain data and ANN data. If both networks extract the same information, this will be possible.

The two embeddings will be slightly different if they extract slightly different information. Therefore, the more similar (and thus, more correlated) the embeddings are, the more similar the information extracted.

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Brains_on_Forrest_Gump_Discussion")

Video 9: Brains on Forrest Gump Wrap-up

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Brains_on_Forrest_Gump_WrapUp_Video")

Check out the paper mentioned in the above video:

  • Andrew, G., Arora, R., Bilmes, J., Livescu, K. (2013). Deep Canonical Correlation Analysis. Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1247-1255. url: proceedings.mlr.press/v28/andrew13


Summary

Time estimate: ~2 mins

Video 10: Wrap up of DL thinking

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_WrapUp_of_DL_thinking_Video")

In this tutorial, we saw several tricks on how to do well when there is very limited data we saw:

  • Data augmentation

  • Pretraining

  • Canonical Correlation Analysis (CCA)

All three can be used in cases where there is limited data available. All three also teach us how the relevant information may be quite clear once we think about it. And how ideas about the world translate into approaches in deep learning.


Daily survey

Don’t forget to complete your reflections and content check in the daily survey! Please be patient after logging in as there is a small delay before you will be redirected to the survey.

button link to survey