Open In Colab   Open in Kaggle

Tutorial 2: Deep Learning Thinking 3

Week 3, Day 5: Reinforcement Learning for Games & DL Thinking 3

By Neuromatch Academy

Content creators: Konrad Kording, Lyle Ungar

Content reviewers: Ella Batty, Shaonan Wang, Gunnar Blohm

Content editors: Ella Batty, Shaonan Wang

Production editors: Ella Batty, Spiros Chavlis


Tutorial Objectives


Setup

⚠ Experimental LLM-enhanced tutorial ⚠

This notebook includes Neuromatch’s experimental Chatify 🤖 functionality. The Chatify notebook extension adds support for a large language model-based “coding tutor” to the materials. The tutor provides automatically generated text to help explain any code cell in this notebook.

Note that using Chatify may cause breaking changes and/or provide incorrect or misleading information. If you wish to proceed by installing and enabling the Chatify extension, you should run the next two code blocks (hidden by default). If you do not want to use this experimental version of the Neuromatch materials, please use the stable materials instead.

To use the Chatify helper, insert the %%explain magic command at the start of any code cell and then run it (shift + enter) to access an interface for receiving LLM-based assitance. You can then select different options from the dropdown menus depending on what sort of assitance you want. To disable Chatify and run the code block as usual, simply delete the %%explain command and re-run the cell.

Note that, by default, all of Chatify’s responses are generated locally. This often takes several minutes per response. Once you click the “Submit request” button, just be patient– stuff is happening even if you can’t see it right away!

Thanks for giving Chatify a try! Love it? Hate it? Either way, we’d love to hear from you about your Chatify experience! Please consider filling out our brief survey to provide feedback and help us make Chatify more awesome!

Run the next two cells to install and configure Chatify…

%pip install -q davos
import davos
davos.config.suppress_stdout = True
Note: you may need to restart the kernel to use updated packages.
smuggle chatify      # pip: git+https://github.com/ContextLab/chatify.git
%load_ext chatify
Using default configuration!
Downloading the 'cache' file.

Install and import feedback gadget

# @title Install and import feedback gadget

!pip3 install vibecheck datatops --quiet

from vibecheck import DatatopsContentReviewContainer
def content_review(notebook_section: str):
    return DatatopsContentReviewContainer(
        "",  # No text prompt
        notebook_section,
        {
            "url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
            "name": "neuromatch_dl",
            "user_key": "f379rz8y",
        },
    ).render()


feedback_prefix = "W3D5_T2"

Section 1: Intro to Deep Learning Thinking 3

Time estimate: ~3 mins

This tutorial is the third installment of our deep learning thinking series. Like the others, there will be no coding! Instead, you will watch a series of vignettes about various scenarios where you want to use a neural network.

Each section below will start with a vignette where either Lyle or Konrad is trying to figure out how to set up a neural network for a specific problem. Try to think of questions you want to ask them as you watch, then pay attention to what questions Lyle and Konrad are asking. Were they what you would have asked? How do their questions help quickly clarify the situation?

Video 1: Intro to DL Thinking 3

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Intro_to_DL_Thinking_3_Video")

Section 2: The Future

Time estimate: ~15 mins

Video 2: The Future Vignette

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_The_Future_Video")

Think! 1: The Future

Things change over time! How might we address this problem in machine learning?

Please discuss as a group. If you get stuck, you can uncover the hints below one at a time to help guide your discussion. Please spend some time discussing before uncovering the next hint though!

Click here for hint 1

Can you think of a scenario in your life where data is i.i.d?

Let’s use an example. Konrad hears questions x about a topic during a class he’s teaching, and tries to learn answers y to each of them. Will the questions next year be the same as the questions this year?

Once we know that the questions next year will be different, what should Konrad do to be ready for the questions next year?

More generally, why does it matter that the future is different from the past? How does that relate to i.i.d.?

Click here for hint 2

So the future will have a different distribution p(X) than the past. It will also have a different distribution of good answers p(Y). After all, deep learning is evolving. And training in deep learning is also evolving. What could be a good model, in the context of deep learning, of p(X)? How could it be built into a deep learning system? Would it even be necessary to build it in? What about p(Y)? Would that need to be built in? Arguably, DL is about how to find p(Y|X). Why do we need to care about X changing? Why about Y changing?

In this context, let us talk about curiosity. If p(X) and p(Y) are both changing, what would we want to learn about? We probably want to focus on things that will still be true in the future. What are the things that are true today and will be true in the future? Which things change?

Click here for hint 3

The objects in a room change. Their affordances do not, you will be able to sit on a chair tomorrow. The causal relations in the world do not change. For example, fire will still burn wood in the future. Their constituents do not change. For example, mammals still consist of blood, flesh, and bones. Other things change all the time. Who is in this room? Who am I talking with? There are also things that do change that I do not care about, e.g. somewhere far away there is a causal system I do not understand. So we do not just want to have good models in the future, we want to have good models to answer the kinds of questions we will actually be asked.

Click here for solution

Here is a paper talking about these phenomena: https://arxiv.org/abs/2201.07372.

The paper above is worth a full read! In brief, it outlines an argument that most AI focuses on retrospective learning, where it uses past experiences to learn and make predictions in the future, assuming the future mimics the past. The authors argue that true intelligence is prospective learning, where AI learns for an uncertain future by updating internal models to be useful for future novel tasks. They articulate four revelant factors that jointly define prospective learning: “Continual learning enables intelligences to remember those aspects of the past which it believes will be most useful in the future. Prospective constraints (including biases and priors) facilitate the intelligence finding general solutions that will be applicable to future problems. Curiosity motivates taking actions that inform future decision making, including in previously unmet situations. Causal estimation enables learning the structure of relations that guide choosing actions for specific outcomes, even when the specific action-outcome contingencies have never been observed before.”

Discussion point: in the light of this paper, how would you design DL algorithms differently? Which problems are easy vs hard to overcome?

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_The_Future_Discussion")

Section 3: In-context Learning

Time estimate: ~15 mins

Video 3: In-context Learning Vignette

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_In_context_learning_Vignette_Video")

Think! 2: In-context Learning

Please discuss as a group. If you get stuck, you can uncover the hints below one at a time to guide your discussion. Please spend some time discussing before uncovering the next hint though!

Click here for hint 1

What is in context learning? The context is basically the words that LLMs have in their context right now. To predicts what is next we, arguably, solve a machine learning problem. How would you write this as an equation?

Try if you can get chatGPT to solve in-context learning problems.

Click here for hint 2

Let us talk about the nature of the problem. How to we train systems and how may that relate to how we do in-context learning?

Meta-learning is defined as a system that learns how to learn. Is in-context learning a version of meta-learning?

Click here for hint 3

How could the transformer architecture be helpful for solving in-context learning problems? DL does do gradient descent. Do you think there is gradient descent learning in-context?

Click here for solution

Here are two papers about these topics:

  1. arxiv:2211.15561

  2. arxiv:2212.10559

Here is a summary of the above papers:

Paper 1: Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. The authors investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context.

Paper 2: Large pretrained language models possess impressive In-Context Learning (ICL) capabilities, enabling them to predict labels for unseen inputs with just a few demonstration input-label pairs and without additional parameter updates. However, the underlying mechanism of ICL remains unknown. This paper views language models as meta-optimizers and conceptualizes ICL as a form of implicit finetuning, uncovering that Transformer attention follows a dual form of gradient descent optimization. Experimental comparisons confirm that ICL performs similarly to explicit finetuning in terms of prediction, representation, and attention behavior, and the introduction of momentum-based attention, inspired by meta-optimization, demonstrates its potential for future model design.

Discussion point: In the light of these papers, how can we build more solid theories of ICL? What is currently understood? What kind of theorems do you think we may be able to prove about ICL? Can we expect to improve ICL through architecture engineering?

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_In_context_learning_Discussion")

Section 4: Memories

Time estimate: ~15 mins

Video 4: Memories Vignette

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Memories_Vignette_Video")

Think! 3: Memories

Please discuss as a group. If you get stuck, you can uncover the hints below one at a time. Please spend some time discussing before uncovering the next hint, though! You are being real deep learning scientists now, and the answers won’t be easy.

Click here for hint 1

How would you define episodic memory?

How would you define procedural memory?

How would you define semantic memory?

Click here for hint 2

The main interesting thing here is episodic memory. For example, you and your best friend may both know that it is a fact that you had dinner together yesterday night. How would you build episodic memory into a deep learning systems? Can you gradient descent into an episodic memory?

Click here for hint 3

Episodic memory is after all something about absolute truth. So it is not entirely clear how it relates to gradient descent. It is easy to see how we may use gradient descent for reading though. So what if you instead just had a big blackboard. What would you write on it? When/how would you erase it?

Click here for solution

Here are a few fun papers. https://arxiv.org/abs/1410.5401, https://arxiv.org/abs/1805.07603, https://arxiv.org/abs/1703.03129

Here is a summary of the above papers:

Paper 1: While modern machine learning has been successful in modeling complex data, it has largely overlooked the use of logical flow control and external memory, despite their importance in computer programs. Recurrent neural networks (RNNs) have the ability to carry out intricate data transformations over extended periods and are Turing-Complete, capable of simulating arbitrary procedures. To simplify the solution of algorithmic tasks, the authors introduce the Neural Turing Machine (NTM), which enriches standard recurrent networks with a large, addressable memory, resembling human working memory and utilizing an attentional process for selective reading and writing. The NTM can be trained through gradient descent, making it a practical mechanism for learning programs.

Paper 2: Deep reinforcement learning (RL) algorithms have seen great advancements by utilizing deep neural networks (DNNs), but they suffer from sample inefficiency. To address this, the authors propose Episodic Memory Deep Q-Networks (EMDQN), a biologically inspired RL algorithm that uses episodic memory for training supervision. Experimental results demonstrate that EMDQN achieves better sample efficiency, outperforming regular DQN and other episodic memory based RL algorithms, requiring only 1/5 of the interactions of DQN to achieve state-of-the-art performance on Atari games. Paper 3: Existing memory-augmented deep neural networks face limitations in lifelong and one-shot learning, particularly in remembering rare events. To address this, the authors introduce a large-scale lifelong memory module that utilizes fast nearest-neighbor algorithms for efficiency and scalability. The module is fully differentiable, trained end-to-end without additional supervision, and can be seamlessly integrated into various neural network architectures. Experimental results demonstrate the module’s ability to achieve state-of-the-art performance in one-shot learning tasks on the Omniglot dataset and enable lifelong one-shot learning in recurrent neural networks for large-scale machine translation.

Discussion point: In the light of these papers, which desirable aspects of memory can we already build? How do they differ from the way humans do it? How close are we to the agility of memory exhibited by humans?

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Memories_Discussion")

Section 5: Multiple Information Sources

Time estimate: ~15 mins

Video 5: Multiple Information Sources Vignette

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Multiple_Information_Sources_Vignette_Video")

Think! 4: Multiple Information Sources

Please discuss as a group. If you get stuck, you can uncover the hints below one at a time. Please spend some time discussing before uncovering the next hint, though! You are being real deep learning scientists now, and the answers won’t be easy.

Click here for hint 1

Think of a few tasks. If you wanted to solve them, which webpages would you use? How would you combine them? How could you allow a DL system to do the same thing?

Click here for hint 2

If you have multiple sources of information, how synergistic are they? How could you incorporate information from multiple sources? Could there be ways of summarizing the things you could get out of a given model?

Click here for hint 3

English language and querying can be a way of interacting with such services. Think about webpages as something that you can ask questions to.

Click here for solution

Here are a list of recent relevant projects:

  1. Auto GPT

  2. arxiv:2302.14045

  3. PaLMe

Here is a summary of the above paper:

A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Experimental results show that the proposed model achieves impressive performance on language understanding, generation, OCR-free NLP, perception-language tasks, and vision tasks, as well as that MLLMs can benefit from cross-modal transfer.

Discussion points: How should information be combined across modalities? Why does it help to ccombine them? What does that mean about the future of science?

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Multiple_Information_Sources_Discussion")

Section 6: Language for Robotics

Time estimate: ~15 mins

Video 6: Language for Robotics Vignette

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Language_for_Robotics_Video")

Think! 5: Language for Robotics

Click here for hint 1

Think about a robotics problem, e.g. emptying the dishwasher. Is there a way of dividing it into subproblems? How many of them? How complex are they?

Click here for hint 2

If you look at the subproblems, how complex are they? What could be a good way of describing each subproblem? If a robot controlling AI system would have such a description, how useful would that be?

Click here for hint 3

If we were to use a large language model to interact with the RL systems, how useful would that be? Why is language so useful for this task?

Click here for solution

Have a look at this great collection of papers: Awesome-LLM-Robotocs

Here is a summary of the papers listed in github: These papers cover topics such as reasoning, planning, manipulation, instructions and navigation, and simulation frameworks. Some of the papers use pre-trained models such as GPT-3, BERT, or CLIP to perform tasks like mapping natural language instructions to robotic actions, generating situated robot task plans, or grounding language in robotic affordances. Some of the papers propose new models or methods that combine language, vision, and action for embodied reasoning, navigation, or control. For example, RT-1 is a robotics transformer that can learn from large-scale data and generalize to new environments. PaLM-E is an embodied multimodal language model that can interact with objects and agents in a 3D world. LLM+P is a method that empowers large language models with optimal planning proficiency. Some of the papers also provide codes, websites, or colabs for reproducing or testing their results. For example, you can try out Code-As-Policies, which uses language model programs for embodied control, or Socratic, which composes zero-shot multimodal reasoning with language.

Discussion points: how will the coding of robots change over time? What are the big outstanding problems?

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Language_for_Robotics_Discussion")

Summary

Time estimate: ~2 mins


Daily survey

Don’t forget to complete your reflections and content check in the daily survey! Please be patient after logging in as there is a small delay before you will be redirected to the survey.

button link to survey