Open In Colab   Open in Kaggle

Tutorial 2: Regularization techniques part 2

Week 2, Day 1: Regularization

By Neuromatch Academy

Content creators: Ravi Teja Konkimalla, Mohitrajhu Lingan Kumaraian, Kevin Machado Gamboa, Kelson Shilling-Scrivo, Lyle Ungar

Content reviewers: Piyush Chauhan, Siwei Bai, Kelson Shilling-Scrivo

Content editors: Roberto Guidotti, Spiros Chavlis

Production editors: Saeed Salehi, Gagana B, Spiros Chavlis


Tutorial Objectives

  1. Regularization as shrinkage of overparameterized models: L1 and L2

  2. Regularization by Dropout

  3. Regularization by Data Augmentation

  4. Perils of Hyper-Parameter Tuning

  5. Rethinking generalization


Setup

Note that some of the code for today can take up to an hour to run. We have therefore “hidden” that code and shown the resulting outputs.

⚠ Experimental LLM-enhanced tutorial ⚠

This notebook includes Neuromatch’s experimental Chatify 🤖 functionality. The Chatify notebook extension adds support for a large language model-based “coding tutor” to the materials. The tutor provides automatically generated text to help explain any code cell in this notebook.

Note that using Chatify may cause breaking changes and/or provide incorrect or misleading information. If you wish to proceed by installing and enabling the Chatify extension, you should run the next two code blocks (hidden by default). If you do not want to use this experimental version of the Neuromatch materials, please use the stable materials instead.

To use the Chatify helper, insert the %%explain magic command at the start of any code cell and then run it (shift + enter) to access an interface for receiving LLM-based assitance. You can then select different options from the dropdown menus depending on what sort of assitance you want. To disable Chatify and run the code block as usual, simply delete the %%explain command and re-run the cell.

Note that, by default, all of Chatify’s responses are generated locally. This often takes several minutes per response. Once you click the “Submit request” button, just be patient– stuff is happening even if you can’t see it right away!

Thanks for giving Chatify a try! Love it? Hate it? Either way, we’d love to hear from you about your Chatify experience! Please consider filling out our brief survey to provide feedback and help us make Chatify more awesome!

Run the next two cells to install and configure Chatify…

%pip install -q davos
import davos
davos.config.suppress_stdout = True
Note: you may need to restart the kernel to use updated packages.
smuggle chatify      # pip: git+https://github.com/ContextLab/chatify.git
%load_ext chatify
Using default configuration!
Downloading the 'cache' file.

Install dependencies

WARNING: There may be errors and/or warnings reported during the installation. However, they should be ignored.

# @title Install dependencies

# @markdown **WARNING**: There may be *errors* and/or *warnings* reported during the installation. However, they should be ignored.

!pip install imageio --quiet
!pip install imageio-ffmpeg --quiet

Install and import feedback gadget

# @title Install and import feedback gadget

!pip3 install vibecheck datatops --quiet

from vibecheck import DatatopsContentReviewContainer
def content_review(notebook_section: str):
    return DatatopsContentReviewContainer(
        "",  # No text prompt
        notebook_section,
        {
            "url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
            "name": "neuromatch_dl",
            "user_key": "f379rz8y",
        },
    ).render()


feedback_prefix = "W2D1_T2"
# Imports
import copy
import torch
import random
import pathlib

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation

import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

from torchvision import transforms
from torchvision.datasets import ImageFolder

from tqdm.auto import tqdm
from IPython.display import HTML, display

Figure Settings

# @title Figure Settings
import logging
logging.getLogger('matplotlib.font_manager').disabled = True

import ipywidgets as widgets
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.style.use("https://raw.githubusercontent.com/NeuromatchAcademy/content-creation/main/nma.mplstyle")

Loading Animal Faces Data

# @title Loading Animal Faces Data
import requests, os
from zipfile import ZipFile

print("Start downloading and unzipping `AnimalFaces` dataset...")
name = 'afhq'
fname = f"{name}.zip"
url = f"https://osf.io/kgfvj/download"

if not os.path.exists(fname):
  r = requests.get(url, allow_redirects=True)
  with open(fname, 'wb') as fh:
    fh.write(r.content)

  if os.path.exists(fname):
    with ZipFile(fname, 'r') as zfile:
      zfile.extractall(f".")
      os.remove(fname)

print("Download completed.")
Start downloading and unzipping `AnimalFaces` dataset...
Download completed.

Loading Animal Faces Randomized data

# @title Loading Animal Faces Randomized data

print("Start downloading and unzipping `Randomized AnimalFaces` dataset...")

names = ['afhq_random_32x32', 'afhq_10_32x32']
urls = ["https://osf.io/9sj7p/download",
        "https://osf.io/wvgkq/download"]


for i, name in enumerate(names):
  url = urls[i]
  fname = f"{name}.zip"

  if not os.path.exists(fname):
    r = requests.get(url, allow_redirects=True)
    with open(fname, 'wb') as fh:
      fh.write(r.content)

    if os.path.exists(fname):
      with ZipFile(fname, 'r') as zfile:
        zfile.extractall(f".")
        os.remove(fname)

print("Download completed.")
Start downloading and unzipping `Randomized AnimalFaces` dataset...
Download completed.

Plotting functions

# @title Plotting functions

def imshow(img):
  """
  Display unnormalized image

  Args:
    img: np.ndarray
      Datapoint to visualize

  Returns:
    Nothing
  """
  img = img / 2 + 0.5  # Unnormalize
  npimg = img.numpy()
  plt.imshow(np.transpose(npimg, (1, 2, 0)))
  plt.axis(False)
  plt.show()


def plot_weights(norm, labels, ws, title='Weight Size Measurement'):
  """
  Plot of weight size measurement [norm value vs layer]

  Args:
    norm: float
      Norm values
    labels: list
      Targets
    ws: list
      Weights
    title: string
      Title of plot

  Returns:
    Nothing
  """
  plt.figure(figsize=[8, 6])
  plt.title(title)
  plt.ylabel('Frobenius Norm Value')
  plt.xlabel('Model Layers')
  plt.bar(labels, ws)
  plt.axhline(y=norm,
              linewidth=1,
              color='r',
              ls='--',
              label='Total Model F-Norm')
  plt.legend()
  plt.show()


def visualize_data(dataloader):
  """
  Helper function to visualize data

  Args:
    dataloader: torch.tensor
      Dataloader to visualize

  Returns:
    Nothing
  """
  for idx, (data,label) in enumerate(dataloader):
    plt.figure(idx)
    # Choose the datapoint you would like to visualize
    index = 22

    # Choose that datapoint using index and permute the dimensions
    # and bring the pixel values between [0,1]
    data = data[index].permute(1, 2, 0) * \
           torch.tensor([0.5, 0.5, 0.5]) + \
           torch.tensor([0.5, 0.5, 0.5])

    # Convert the torch tensor into numpy
    data = data.numpy()

    plt.imshow(data)
    plt.axis(False)
    image_class = classes[label[index].item()]
    print(f'The image belongs to : {image_class}')

  plt.show()

Helper functions

# @title Helper functions

class AnimalNet(nn.Module):
  """
  Network Class - Animal Faces with following structure:
  nn.Linear(3 * 32 * 32, 128) # Fully connected layer 1
  nn.Linear(128, 32) # Fully connected layer 2
  nn.Linear(32, 3) # Fully connected layer 3
  """

  def __init__(self):
    """
    Initialize parameters of AnimalNet

    Args:
      None

    Returns:
      Nothing
    """
    super(AnimalNet, self).__init__()
    self.fc1 = nn.Linear(3 * 32 * 32, 128)
    self.fc2 = nn.Linear(128, 32)
    self.fc3 = nn.Linear(32, 3)

  def forward(self, x):
    """
    Forward Pass of AnimalNet

    Args:
      x: torch.tensor
        Input features

    Returns:
      output: torch.tensor
        Outputs/Predictions
    """
    x = x.view(x.shape[0], -1)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    output = F.log_softmax(x, dim=1)
    return output


class Net(nn.Module):
  """
  Network Class - 2D with following structure
  nn.Linear(1, 300) + leaky_relu(self.fc1(x)) # First fully connected layer
  nn.Linear(300, 500) + leaky_relu(self.fc2(x)) # Second fully connected layer
  nn.Linear(500, 1) # Final fully connected layer
  """

  def __init__(self):
    """
    Initialize parameters of Net

    Args:
      None

    Returns:
      Nothing
    """
    super(Net, self).__init__()

    self.fc1 = nn.Linear(1, 300)
    self.fc2 = nn.Linear(300, 500)
    self.fc3 = nn.Linear(500, 1)

  def forward(self, x):
    """
    Forward pass of Net

    Args:
      x: torch.tensor
        Input features

    Returns:
      x: torch.tensor
        Output/Predictions
    """
    x = F.leaky_relu(self.fc1(x))
    x = F.leaky_relu(self.fc2(x))
    output = self.fc3(x)
    return output


class BigAnimalNet(nn.Module):
  """
  Network Class - Animal Faces with following structure:
  nn.Linear(3*32*32, 124) + leaky_relu(self.fc1(x)) # First fully connected layer
  nn.Linear(124, 64) + leaky_relu(self.fc2(x)) # Second fully connected layer
  nn.Linear(64, 3) # Final fully connected layer
  """

  def __init__(self):
    """
    Initialize parameters for BigAnimalNet

    Args:
      None

    Returns:
      Nothing
    """
    super(BigAnimalNet, self).__init__()
    self.fc1 = nn.Linear(3*32*32, 124)
    self.fc2 = nn.Linear(124, 64)
    self.fc3 = nn.Linear(64, 3)

  def forward(self, x):
    """
    Forward pass of BigAnimalNet

    Args:
      x: torch.tensor
        Input features

    Returns:
      x: torch.tensor
        Output/Predictions
    """
    x = x.view(x.shape[0],-1)
    x = F.leaky_relu(self.fc1(x))
    x = F.leaky_relu(self.fc2(x))
    x = self.fc3(x)
    output = F.log_softmax(x, dim=1)
    return output


def train(args, model, train_loader, optimizer, epoch,
          reg_function1=None, reg_function2=None, criterion=F.nll_loss):
  """
  Trains the current input model using the data
  from Train_loader and Updates parameters for a single pass

  Args:
    args: dictionary
      Dictionary with epochs: 200, lr: 5e-3, momentum: 0.9, device: DEVICE
    model: nn.module
      Neural network instance
    train_loader: torch.loader
      Input dataset
    optimizer: function
      Optimizer
    reg_function1: function
      Regularisation function [default: None]
    reg_function2: function
      Regularisation function [default: None]
    criterion: function
      Specifies loss function [default: nll_loss]

  Returns:
    model: nn.module
      Neural network instance post training
  """
  device = args['device']
  model.train()
  for batch_idx, (data, target) in enumerate(train_loader):
    data, target = data.to(device), target.to(device)
    optimizer.zero_grad()
    output = model(data)
    # L1 regularization
    if reg_function2 is None and reg_function1 is not None:
      loss = criterion(output, target) + args['lambda1']*reg_function1(model)
    # L2 regularization
    elif reg_function1 is None and reg_function2 is not None:
      loss = criterion(output, target) + args['lambda2']*reg_function2(model)
    # No regularization
    elif reg_function1 is None and reg_function2 is None:
      loss = criterion(output, target)
    # Both L1 and L2 regularizations
    else:
      loss = criterion(output, target) + args['lambda1']*reg_function1(model) + args['lambda2']*reg_function2(model)
    loss.backward()
    optimizer.step()

  return model


def test(model, test_loader, loader='Test', criterion=F.nll_loss,
         device='cpu'):
  """
  Tests the current model

  Args:
    model: nn.module
      Neural network instance
    device: string
      GPU/CUDA if available, CPU otherwise
    test_loader: torch.loader
      Test dataset
    criterion: function
      Specifies loss function [default: nll_loss]

  Returns:
    test_loss: float
      Test loss
  """
  model.eval()
  test_loss = 0
  correct = 0
  with torch.no_grad():
    for data, target in test_loader:
      data, target = data.to(device), target.to(device)
      output = model(data)
      test_loss += criterion(output, target, reduction='sum').item()  # sum up batch loss
      pred = output.argmax(dim=1, keepdim=True)  # Get the index of the max log-probability
      correct += pred.eq(target.view_as(pred)).sum().item()

  test_loss /= len(test_loader.dataset)
  return 100. * correct / len(test_loader.dataset)


def main(args, model, train_loader, val_loader, test_data,
         reg_function1=None, reg_function2=None, criterion=F.nll_loss):
  """
  Trains the model with train_loader and
  tests the learned model using val_loader

  Args:
    args: dictionary
      Dictionary with epochs: 200, lr: 5e-3, momentum: 0.9, device: DEVICE
    model: nn.module
      Neural network instance
    train_loader: torch.loader
      Train dataset
    val_loader: torch.loader
      Validation set
    reg_function1: function
      Regularisation function [default: None]
    reg_function2: function
      Regularisation function [default: None]

  Returns:
    val_acc_list: list
      Log of validation accuracy
    train_acc_list: list
      Log of training accuracy
    param_norm_list: list
      Log of frobenius norm
    trained_model: nn.module
      Trained model/model post training
  """
  device = args['device']

  model = model.to(device)
  optimizer = optim.SGD(model.parameters(), lr=args['lr'], momentum=args['momentum'])

  val_acc_list, train_acc_list,param_norm_list = [], [], []
  for epoch in tqdm(range(args['epochs'])):
    trained_model = train(args, model, train_loader, optimizer, epoch,
                          reg_function1=reg_function1,
                          reg_function2=reg_function2)
    train_acc = test(trained_model, train_loader, loader='Train', device=device)
    val_acc = test(trained_model, val_loader, loader='Val', device=device)
    param_norm = calculate_frobenius_norm(trained_model)
    train_acc_list.append(train_acc)
    val_acc_list.append(val_acc)
    param_norm_list.append(param_norm)

  return val_acc_list, train_acc_list, param_norm_list, model


def calculate_frobenius_norm(model):
    """
    Function to calculate frobenius norm

    Args:
      model: nn.module
        Neural network instance

    Returns:
      norm: float
        Frobenius norm
    """
    norm = 0.0
    # Sum the square of all parameters
    for name,param in model.named_parameters():
        norm += torch.norm(param).data**2
    # Return a square root of the sum of squares of all the parameters
    return norm**0.5


def early_stopping_main(args, model, train_loader, val_loader, test_data):
  """
  Function to simulate early stopping

  Args:
    args: dictionary
      Dictionary with epochs: 200, lr: 5e-3, momentum: 0.9, device: DEVICE
    model: nn.module
      Neural network instance
    train_loader: torch.loader
      Train dataset
    val_loader: torch.loader
      Validation set

  Returns:
    val_acc_list: list
      Val accuracy log until early stop point
    train_acc_list: list
      Training accuracy log until early stop point
    best_model: nn.module
      Model performing best with early stopping
    best_epoch: int
      Epoch at which early stopping occurs
  """
  device = args['device']

  model = model.to(device)
  optimizer = optim.SGD(model.parameters(), lr=args['lr'], momentum=args['momentum'])

  best_acc  = 0.0
  best_epoch = 0

  # Number of successive epochs that you want to wait before stopping training process
  patience = 20

  # Keps track of number of epochs during which the val_acc was less than best_acc
  wait = 0

  val_acc_list, train_acc_list = [], []
  for epoch in tqdm(range(args['epochs'])):
    trained_model = train(args, model, device, train_loader, optimizer, epoch)
    train_acc = test(trained_model, train_loader, loader='Train', device=device)
    val_acc = test(trained_model, val_loader, loader='Val', device=device)
    if (val_acc > best_acc):
      best_acc = val_acc
      best_epoch = epoch
      best_model = copy.deepcopy(trained_model)
      wait = 0
    else:
      wait += 1
    if (wait > patience):
      print(f'Early stopped on epoch: {epoch}')
      break
    train_acc_list.append(train_acc)
    val_acc_list.append(val_acc)

  return val_acc_list, train_acc_list, best_model, best_epoch

Set random seed

Executing set_seed(seed=seed) you are setting the seed

# @title Set random seed
# @markdown Executing `set_seed(seed=seed)` you are setting the seed

# For DL its critical to set the random seed so that students can have a
# baseline to compare their results to expected results.
# Read more here: https://pytorch.org/docs/stable/notes/randomness.html

# Call `set_seed` function in the exercises to ensure reproducibility.
import random
import torch

def set_seed(seed=None, seed_torch=True):
  """
  Function that controls randomness. NumPy and random modules must be imported.

  Args:
    seed : Integer
      A non-negative integer that defines the random state. Default is `None`.
    seed_torch : Boolean
      If `True` sets the random seed for pytorch tensors, so pytorch module
      must be imported. Default is `True`.

  Returns:
    Nothing.
  """
  if seed is None:
    seed = np.random.choice(2 ** 32)
  random.seed(seed)
  np.random.seed(seed)
  if seed_torch:
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

  print(f'Random seed {seed} has been set.')


# In case that `DataLoader` is used
def seed_worker(worker_id):
  """
  DataLoader will reseed workers following randomness in
  multi-process data loading algorithm.

  Args:
    worker_id: integer
      ID of subprocess to seed. 0 means that
      the data will be loaded in the main process
      Refer: https://pytorch.org/docs/stable/data.html#data-loading-randomness for more details

  Returns:
    Nothing
  """
  worker_seed = torch.initial_seed() % 2**32
  np.random.seed(worker_seed)
  random.seed(worker_seed)

Set device (GPU or CPU). Execute set_device()

# @title Set device (GPU or CPU). Execute `set_device()`
# especially if torch modules used.

# Inform the user if the notebook uses GPU or CPU.

def set_device():
  """
  Set the device. CUDA if available, CPU otherwise

  Args:
    None

  Returns:
    Nothing
  """
  device = "cuda" if torch.cuda.is_available() else "cpu"
  if device != "cuda":
    print("WARNING: For this notebook to perform best, "
        "if possible, in the menu under `Runtime` -> "
        "`Change runtime type.`  select `GPU` ")
  else:
    print("GPU is enabled in this notebook.")

  return device
SEED = 2021
set_seed(seed=SEED)
DEVICE = set_device()
Random seed 2021 has been set.
WARNING: For this notebook to perform best, if possible, in the menu under `Runtime` -> `Change runtime type.`  select `GPU` 

Dataloaders for the Dataset

# @title Dataloaders for the Dataset
## Dataloaders for the Dataset
batch_size = 128
classes = ('cat', 'dog', 'wild')

train_transform = transforms.Compose([
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
     ])
data_path = pathlib.Path('.')/'afhq' # Using pathlib to be compatible with all OS's
img_dataset = ImageFolder(data_path/'train', transform=train_transform)


####################################################
g_seed = torch.Generator()
g_seed.manual_seed(SEED)


## Dataloaders for the  Original Dataset
img_train_data, img_val_data,_ = torch.utils.data.random_split(img_dataset,
                                                               [100, 100, 14430])

# Creating train_loader and Val_loader
train_loader = torch.utils.data.DataLoader(img_train_data,
                                           batch_size=batch_size,
                                           worker_init_fn=seed_worker,
                                           num_workers=2,
                                           generator=g_seed)
val_loader = torch.utils.data.DataLoader(img_val_data,
                                         batch_size=1000,
                                         num_workers=2,
                                         worker_init_fn=seed_worker,
                                         generator=g_seed)

# Creating test dataset
test_transform = transforms.Compose([
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
     ])
img_test_dataset = ImageFolder(data_path/'val', transform=test_transform)


####################################################

## Dataloaders for the  Random Dataset

# Splitting randomized data into training and validation data
data_path = pathlib.Path('.')/'afhq_random_32x32/afhq_random' # using pathlib to be compatible with all OS's
img_dataset = ImageFolder(data_path/'train', transform=train_transform)
random_img_train_data, random_img_val_data,_ = torch.utils.data.random_split(img_dataset, [100,100,14430])

# Randomized train and validation dataloader
rand_train_loader = torch.utils.data.DataLoader(random_img_train_data,
                                                batch_size=batch_size,
                                                num_workers=2,
                                                worker_init_fn=seed_worker,
                                                generator=g_seed)
rand_val_loader = torch.utils.data.DataLoader(random_img_val_data,
                                              batch_size=1000,
                                              num_workers=2,
                                              worker_init_fn=seed_worker,
                                              generator=g_seed)

####################################################

## Dataloaders for the Partially Random Dataset

# Splitting data between training and validation dataset for partially randomized data
data_path = pathlib.Path('.')/'afhq_10_32x32/afhq_10' # using pathlib to be compatible with all OS's
img_dataset = ImageFolder(data_path/'train', transform=train_transform)
partially_random_train_data, partially_random_val_data, _ = torch.utils.data.random_split(img_dataset, [100,100,14430])

# Training and Validation loader for partially randomized data
partial_rand_train_loader = torch.utils.data.DataLoader(partially_random_train_data,
                                                        batch_size=batch_size,
                                                        num_workers=2,
                                                        worker_init_fn=seed_worker,
                                                        generator=g_seed)
partial_rand_val_loader = torch.utils.data.DataLoader(partially_random_val_data,
                                                      batch_size=1000,
                                                      num_workers=2,
                                                      worker_init_fn=seed_worker,
                                                      generator=g_seed)

Section 1: L1 and L2 Regularization

Time estimate: ~30 mins

Video 1: L1 and L2 regularization

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_L1_and_L2_regularization_Video")

Some of you might have already come across L1 and L2 regularization before in other courses. L1 and L2 are the most common types of regularization. These update the general cost function by adding another term known as the regularization term.


(58)\[\begin{equation} \text{Cost function} = Loss(\text{e.g., binary cross entropy}) + \text{Regularization term} \end{equation}\]

This regularization term makes the parameters smaller, giving simpler models that will overfit less.

Discuss among your teammates whether the above assumption is good or bad?

Section 1.1: Unregularized Model

Dataloaders for Regularization

# @markdown #### Dataloaders for Regularization
data_path = pathlib.Path('.')/'afhq' # Using pathlib to be compatible with all OS's
img_dataset = ImageFolder(data_path/'train', transform=train_transform)

# Splitting dataset
reg_train_data, reg_val_data,_ = torch.utils.data.random_split(img_dataset,
                                                               [30, 100, 14500])
g_seed = torch.Generator()
g_seed.manual_seed(SEED)

# Creating train_loader and Val_loader
reg_train_loader = torch.utils.data.DataLoader(reg_train_data,
                                               batch_size=batch_size,
                                               worker_init_fn=seed_worker,
                                               num_workers=2,
                                               generator=g_seed)
reg_val_loader = torch.utils.data.DataLoader(reg_val_data,
                                             batch_size=1000,
                                             worker_init_fn=seed_worker,
                                             num_workers=2,
                                             generator=g_seed)

Now let’s train a model without regularization and keep it aside as our benchmark for this section.

# Set the arguments
args = {
    'epochs': 150,
    'lr': 5e-3,
    'momentum': 0.99,
    'device': DEVICE,
}

# Initialize the model
set_seed(seed=SEED)
model = AnimalNet()

# Train the model
val_acc_unreg, train_acc_unreg, param_norm_unreg, _ = main(args,
                                                           model,
                                                           reg_train_loader,
                                                           reg_val_loader,
                                                           img_test_dataset)

# Train and Test accuracy plot
plt.figure()
plt.plot(val_acc_unreg, label='Val Accuracy', c='red', ls='dashed')
plt.plot(train_acc_unreg, label='Train Accuracy', c='red', ls='solid')
plt.axhline(y=max(val_acc_unreg), c='green', ls='dashed')
plt.title('Unregularized Model')
plt.ylabel('Accuracy (%)')
plt.xlabel('Epoch')
plt.legend()
plt.show()
print(f"Maximum Validation Accuracy reached: {max(val_acc_unreg)}")
Random seed 2021 has been set.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1132, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1131 try:
-> 1132     data = self._data_queue.get(timeout=timeout)
   1133     return (True, data)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/queues.py:113, in Queue.get(self, block, timeout)
    112 timeout = deadline - time.monotonic()
--> 113 if not self._poll(timeout):
    114     raise Empty

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:257, in _ConnectionBase.poll(self, timeout)
    256 self._check_readable()
--> 257 return self._poll(timeout)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:424, in Connection._poll(self, timeout)
    423 def _poll(self, timeout):
--> 424     r = wait([self], timeout)
    425     return bool(r)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:931, in wait(object_list, timeout)
    930 while True:
--> 931     ready = selector.select(timeout)
    932     if ready:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/selectors.py:416, in _PollLikeSelector.select(self, timeout)
    415 try:
--> 416     fd_event_list = self._selector.poll(timeout)
    417 except InterruptedError:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/_utils/signal_handling.py:66, in _set_SIGCHLD_handler.<locals>.handler(signum, frame)
     63 def handler(signum, frame):
     64     # This following call uses `waitid` with WNOHANG from C side. Therefore,
     65     # Python can still get and update the process status successfully.
---> 66     _error_if_any_worker_fails()
     67     if previous_handler is not None:

RuntimeError: DataLoader worker (pid 80101) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[19], line 14
     11 model = AnimalNet()
     13 # Train the model
---> 14 val_acc_unreg, train_acc_unreg, param_norm_unreg, _ = main(args,
     15                                                            model,
     16                                                            reg_train_loader,
     17                                                            reg_val_loader,
     18                                                            img_test_dataset)
     20 # Train and Test accuracy plot
     21 plt.figure()

Cell In[11], line 252, in main(args, model, train_loader, val_loader, test_data, reg_function1, reg_function2, criterion)
    250 val_acc_list, train_acc_list,param_norm_list = [], [], []
    251 for epoch in tqdm(range(args['epochs'])):
--> 252   trained_model = train(args, model, train_loader, optimizer, epoch,
    253                         reg_function1=reg_function1,
    254                         reg_function2=reg_function2)
    255   train_acc = test(trained_model, train_loader, loader='Train', device=device)
    256   val_acc = test(trained_model, val_loader, loader='Val', device=device)

Cell In[11], line 159, in train(args, model, train_loader, optimizer, epoch, reg_function1, reg_function2, criterion)
    157 device = args['device']
    158 model.train()
--> 159 for batch_idx, (data, target) in enumerate(train_loader):
    160   data, target = data.to(device), target.to(device)
    161   optimizer.zero_grad()

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:633, in _BaseDataLoaderIter.__next__(self)
    630 if self._sampler_iter is None:
    631     # TODO(https://github.com/pytorch/pytorch/issues/76750)
    632     self._reset()  # type: ignore[call-arg]
--> 633 data = self._next_data()
    634 self._num_yielded += 1
    635 if self._dataset_kind == _DatasetKind.Iterable and \
    636         self._IterableDataset_len_called is not None and \
    637         self._num_yielded > self._IterableDataset_len_called:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1328, in _MultiProcessingDataLoaderIter._next_data(self)
   1325     return self._process_data(data)
   1327 assert not self._shutdown and self._tasks_outstanding > 0
-> 1328 idx, data = self._get_data()
   1329 self._tasks_outstanding -= 1
   1330 if self._dataset_kind == _DatasetKind.Iterable:
   1331     # Check for _IterableDatasetStopIteration

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1294, in _MultiProcessingDataLoaderIter._get_data(self)
   1290     # In this case, `self._data_queue` is a `queue.Queue`,. But we don't
   1291     # need to call `.task_done()` because we don't use `.join()`.
   1292 else:
   1293     while True:
-> 1294         success, data = self._try_get_data()
   1295         if success:
   1296             return data

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1145, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1143 if len(failed_workers) > 0:
   1144     pids_str = ', '.join(str(w.pid) for w in failed_workers)
-> 1145     raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
   1146 if isinstance(e, queue.Empty):
   1147     return (False, None)

RuntimeError: DataLoader worker (pid(s) 80101, 80102) exited unexpectedly

Section 1.2: L1 Regularization

L1 Regularization (or LASSO\(^{\ddagger}\)) uses a penalty which is the sum of the absolute value of all the weights in the Deep Learning architecture, resulting in the following loss function (\(L\) is the usual Cross-Entropy loss):

(59)\[\begin{equation} L_R = L + \lambda \sum \left| w^{(r)}_{ij} \right| \end{equation}\]

where \(r\) denotes the layer, and \(ij\) the specific weight in that layer.

At a high level, L1 Regularization is similar to L2 Regularization since it leads to smaller weights (you will see the analogy in the next subsection). It results in the following weight update equation when using Stochastic Gradient Descent:

(60)\[\begin{equation} w^{(r)}_{ij}←w^{(r)}_{ij} − \eta \cdot \lambda \cdot \text{sgn}\left(w^{(r)}_{ij}\right)−\eta \frac{\partial L}{\partial w_{ij}^{(r)}} \end{equation}\]

where \(\text{sgn}(\cdot)\) is the sign function, such that

(61)\[\begin{equation} \text{sgn}(w) = \left\{ \begin{array}{ll} +1 & \mbox{if } w > 0 \\ -1 & \mbox{if } w < 0 \\ 0 & \mbox{if } w = 0 \end{array} \right. \end{equation}\]

\(^{\ddagger}\)LASSO: Least Absolute Shrinkage and Selection Operator

Coding Exercise 1.1: L1 Regularization

Write a function that calculates the L1 norm of all the tensors of a PyTorch model.

def l1_reg(model):
  """
  This function calculates the l1 norm of the all the tensors in the model

  Args:
    model: nn.module
      Neural network instance

  Returns:
    l1: float
      L1 norm of the all the tensors in the model
  """
  l1 = 0.0
  ####################################################################
  # Fill in all missing code below (...),
  # then remove or comment the line below to test your function
  raise NotImplementedError("Complete the l1_reg function")
  ####################################################################
  for param in model.parameters():
    l1 += ...

  return l1


set_seed(seed=SEED)
## uncomment to test
# net = nn.Linear(20, 20)
# print(f"L1 norm of the model: {l1_reg(net)}")
Random seed 2021 has been set.
Random seed 2021 has been set.
L1 norm of the model: 48.445133209228516

Click for solution

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_L1_regularization_Exercise")

Now, let’s train a classifier that uses L1 regularization. Tune the hyperparameter lambda1 such that the validation accuracy is higher than that of the unregularized model.

# Set the arguments
args1 = {
    'test_batch_size': 1000,
    'epochs': 150,
    'lr': 5e-3,
    'momentum': 0.99,
    'device': DEVICE,
    'lambda1': 0.001  # <<<<<<<< Tune the hyperparameter lambda1
}

# Initialize the model
set_seed(seed=SEED)
model = AnimalNet()

# Train the model
val_acc_l1reg, train_acc_l1reg, param_norm_l1reg, _ = main(args1,
                                                           model,
                                                           reg_train_loader,
                                                           reg_val_loader,
                                                           img_test_dataset,
                                                           reg_function1=l1_reg)

# Train and Test accuracy plot
plt.figure()
plt.plot(val_acc_l1reg, label='Val Accuracy L1 Regularized',
         c='red', ls='dashed')
plt.plot(train_acc_l1reg, label='Train Accuracy L1 regularized',
         c='red', ls='solid')
plt.axhline(y=max(val_acc_l1reg), c='green', ls='dashed')
plt.title('L1 regularized model')
plt.ylabel('Accuracy (%)')
plt.xlabel('Epoch')
plt.legend()
plt.show()
print(f"Maximum Validation Accuracy Reached: {max(val_acc_l1reg)}")
Random seed 2021 has been set.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1132, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1131 try:
-> 1132     data = self._data_queue.get(timeout=timeout)
   1133     return (True, data)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/queues.py:113, in Queue.get(self, block, timeout)
    112 timeout = deadline - time.monotonic()
--> 113 if not self._poll(timeout):
    114     raise Empty

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:257, in _ConnectionBase.poll(self, timeout)
    256 self._check_readable()
--> 257 return self._poll(timeout)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:424, in Connection._poll(self, timeout)
    423 def _poll(self, timeout):
--> 424     r = wait([self], timeout)
    425     return bool(r)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:931, in wait(object_list, timeout)
    930 while True:
--> 931     ready = selector.select(timeout)
    932     if ready:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/selectors.py:416, in _PollLikeSelector.select(self, timeout)
    415 try:
--> 416     fd_event_list = self._selector.poll(timeout)
    417 except InterruptedError:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/_utils/signal_handling.py:66, in _set_SIGCHLD_handler.<locals>.handler(signum, frame)
     63 def handler(signum, frame):
     64     # This following call uses `waitid` with WNOHANG from C side. Therefore,
     65     # Python can still get and update the process status successfully.
---> 66     _error_if_any_worker_fails()
     67     if previous_handler is not None:

RuntimeError: DataLoader worker (pid 80105) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[22], line 16
     13 model = AnimalNet()
     15 # Train the model
---> 16 val_acc_l1reg, train_acc_l1reg, param_norm_l1reg, _ = main(args1,
     17                                                            model,
     18                                                            reg_train_loader,
     19                                                            reg_val_loader,
     20                                                            img_test_dataset,
     21                                                            reg_function1=l1_reg)
     23 # Train and Test accuracy plot
     24 plt.figure()

Cell In[11], line 252, in main(args, model, train_loader, val_loader, test_data, reg_function1, reg_function2, criterion)
    250 val_acc_list, train_acc_list,param_norm_list = [], [], []
    251 for epoch in tqdm(range(args['epochs'])):
--> 252   trained_model = train(args, model, train_loader, optimizer, epoch,
    253                         reg_function1=reg_function1,
    254                         reg_function2=reg_function2)
    255   train_acc = test(trained_model, train_loader, loader='Train', device=device)
    256   val_acc = test(trained_model, val_loader, loader='Val', device=device)

Cell In[11], line 159, in train(args, model, train_loader, optimizer, epoch, reg_function1, reg_function2, criterion)
    157 device = args['device']
    158 model.train()
--> 159 for batch_idx, (data, target) in enumerate(train_loader):
    160   data, target = data.to(device), target.to(device)
    161   optimizer.zero_grad()

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:633, in _BaseDataLoaderIter.__next__(self)
    630 if self._sampler_iter is None:
    631     # TODO(https://github.com/pytorch/pytorch/issues/76750)
    632     self._reset()  # type: ignore[call-arg]
--> 633 data = self._next_data()
    634 self._num_yielded += 1
    635 if self._dataset_kind == _DatasetKind.Iterable and \
    636         self._IterableDataset_len_called is not None and \
    637         self._num_yielded > self._IterableDataset_len_called:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1328, in _MultiProcessingDataLoaderIter._next_data(self)
   1325     return self._process_data(data)
   1327 assert not self._shutdown and self._tasks_outstanding > 0
-> 1328 idx, data = self._get_data()
   1329 self._tasks_outstanding -= 1
   1330 if self._dataset_kind == _DatasetKind.Iterable:
   1331     # Check for _IterableDatasetStopIteration

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1294, in _MultiProcessingDataLoaderIter._get_data(self)
   1290     # In this case, `self._data_queue` is a `queue.Queue`,. But we don't
   1291     # need to call `.task_done()` because we don't use `.join()`.
   1292 else:
   1293     while True:
-> 1294         success, data = self._try_get_data()
   1295         if success:
   1296             return data

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1145, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1143 if len(failed_workers) > 0:
   1144     pids_str = ', '.join(str(w.pid) for w in failed_workers)
-> 1145     raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
   1146 if isinstance(e, queue.Empty):
   1147     return (False, None)

RuntimeError: DataLoader worker (pid(s) 80105, 80106) exited unexpectedly

What value of lambda1 hyperparameter worked for L1 Regularization?

Note: that the \(\lambda\) in the equations is the lambda1 in the code for clarity.

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Tune_lambda1_Exercise")

Section 1.3: L2 / Ridge Regularization

L2 Regularization (or Ridge), also referred to as “Weight Decay”, is widely used. It works by adding a quadratic penalty term to the Cross-Entropy Loss Function \(L\), which results in a new Loss Function \(L_R\) given by:

(62)\[\begin{equation} L_R = L + \lambda \sum \left( w^{(r)}_{ij} \right)^2 \end{equation}\]

where, again, \(r\) superscript denotes the layer, and \(ij\) the specific weight in that layer.

To get further insight into L2 Regularization, we investigate its effect on the Gradient Descent based update equations for the weight and bias parameters. Taking the derivative on both sides of the above equation, we obtain

(63)\[\begin{equation} \frac{\partial L_R}{\partial w^{(r)}_{ij}}=\frac{\partial L}{\partial w^{(r)}_{ij}} + 2\lambda w^{(r)}_{ij} \end{equation}\]

Thus the weight update rule becomes:

(64)\[\begin{equation} w^{(r)}_{ij}←w^{(r)}_{ij}−η\frac{\partial L}{\partial w^{(r)}_{ij}}−2 \eta \lambda w^{(r)}_{ij}=(1−2 \eta \lambda)w^{(r)}_{ij} − \eta \frac{\partial L}{\partial w^{(r)}_{ij}} \end{equation}\]

where \(\eta\) is the learning rate.

Coding Exercise 1.2: L2 Regularization

Write a function that calculates the L2 norm of all the tensors of a PyTorch model. (What did we call this before?)

def l2_reg(model):
  """
  This function calculates the l2 norm of the all the tensors in the model

  Args:
    model: nn.module
      Neural network instance

  Returns:
    l2: float
      L2 norm of the all the tensors in the model
  """

  l2 = 0.0
  ####################################################################
  # Fill in all missing code below (...),
  # then remove or comment the line below to test your function
  raise NotImplementedError("Complete the l2_reg function")
  ####################################################################
  for param in model.parameters():
    l2 += ...

  return l2


set_seed(SEED)
## uncomment to test
# net = nn.Linear(20, 20)
# print(f"L2 norm of the model: {l2_reg(net)}")
Random seed 2021 has been set.
Random seed 2021 has been set.
L2 norm of the model: 7.328375816345215

Click for solution

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_L2_Ridge_Regularization_Exercise")

Now we’ll train a classifier that uses L2 regularization. Tune the hyperparameter lambda2 such that the validation accuracy is higher than that of the unregularized model.

# Set the arguments
args2 = {
    'test_batch_size': 1000,
    'epochs': 150,
    'lr': 5e-3,
    'momentum': 0.99,
    'device': DEVICE,
    'lambda2': 0.001  # <<<<<<<< Tune the hyperparameter lambda2
}

# Initialize the model
set_seed(seed=SEED)
model = AnimalNet()

# Train the model
val_acc_l2reg, train_acc_l2reg, param_norm_l2reg, model = main(args2,
                                                               model,
                                                               train_loader,
                                                               val_loader,
                                                               img_test_dataset,
                                                               reg_function2=l2_reg)

## Train and Test accuracy plot
plt.figure()
plt.plot(val_acc_l2reg, label='Val Accuracy L2 regularized',
         c='red', ls='dashed')
plt.plot(train_acc_l2reg, label='Train Accuracy L2 regularized',
         c='red', ls='solid')
plt.axhline(y=max(val_acc_l2reg), c='green', ls='dashed')
plt.title('L2 Regularized Model')
plt.ylabel('Accuracy (%)')
plt.xlabel('Epoch')
plt.legend()
plt.show()
print(f"Maximum Validation Accuracy reached: {max(val_acc_l2reg)}")
Random seed 2021 has been set.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1132, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1131 try:
-> 1132     data = self._data_queue.get(timeout=timeout)
   1133     return (True, data)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/queues.py:113, in Queue.get(self, block, timeout)
    112 timeout = deadline - time.monotonic()
--> 113 if not self._poll(timeout):
    114     raise Empty

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:257, in _ConnectionBase.poll(self, timeout)
    256 self._check_readable()
--> 257 return self._poll(timeout)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:424, in Connection._poll(self, timeout)
    423 def _poll(self, timeout):
--> 424     r = wait([self], timeout)
    425     return bool(r)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:931, in wait(object_list, timeout)
    930 while True:
--> 931     ready = selector.select(timeout)
    932     if ready:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/selectors.py:416, in _PollLikeSelector.select(self, timeout)
    415 try:
--> 416     fd_event_list = self._selector.poll(timeout)
    417 except InterruptedError:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/_utils/signal_handling.py:66, in _set_SIGCHLD_handler.<locals>.handler(signum, frame)
     63 def handler(signum, frame):
     64     # This following call uses `waitid` with WNOHANG from C side. Therefore,
     65     # Python can still get and update the process status successfully.
---> 66     _error_if_any_worker_fails()
     67     if previous_handler is not None:

RuntimeError: DataLoader worker (pid 80108) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[26], line 16
     13 model = AnimalNet()
     15 # Train the model
---> 16 val_acc_l2reg, train_acc_l2reg, param_norm_l2reg, model = main(args2,
     17                                                                model,
     18                                                                train_loader,
     19                                                                val_loader,
     20                                                                img_test_dataset,
     21                                                                reg_function2=l2_reg)
     23 ## Train and Test accuracy plot
     24 plt.figure()

Cell In[11], line 252, in main(args, model, train_loader, val_loader, test_data, reg_function1, reg_function2, criterion)
    250 val_acc_list, train_acc_list,param_norm_list = [], [], []
    251 for epoch in tqdm(range(args['epochs'])):
--> 252   trained_model = train(args, model, train_loader, optimizer, epoch,
    253                         reg_function1=reg_function1,
    254                         reg_function2=reg_function2)
    255   train_acc = test(trained_model, train_loader, loader='Train', device=device)
    256   val_acc = test(trained_model, val_loader, loader='Val', device=device)

Cell In[11], line 159, in train(args, model, train_loader, optimizer, epoch, reg_function1, reg_function2, criterion)
    157 device = args['device']
    158 model.train()
--> 159 for batch_idx, (data, target) in enumerate(train_loader):
    160   data, target = data.to(device), target.to(device)
    161   optimizer.zero_grad()

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:633, in _BaseDataLoaderIter.__next__(self)
    630 if self._sampler_iter is None:
    631     # TODO(https://github.com/pytorch/pytorch/issues/76750)
    632     self._reset()  # type: ignore[call-arg]
--> 633 data = self._next_data()
    634 self._num_yielded += 1
    635 if self._dataset_kind == _DatasetKind.Iterable and \
    636         self._IterableDataset_len_called is not None and \
    637         self._num_yielded > self._IterableDataset_len_called:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1328, in _MultiProcessingDataLoaderIter._next_data(self)
   1325     return self._process_data(data)
   1327 assert not self._shutdown and self._tasks_outstanding > 0
-> 1328 idx, data = self._get_data()
   1329 self._tasks_outstanding -= 1
   1330 if self._dataset_kind == _DatasetKind.Iterable:
   1331     # Check for _IterableDatasetStopIteration

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1294, in _MultiProcessingDataLoaderIter._get_data(self)
   1290     # In this case, `self._data_queue` is a `queue.Queue`,. But we don't
   1291     # need to call `.task_done()` because we don't use `.join()`.
   1292 else:
   1293     while True:
-> 1294         success, data = self._try_get_data()
   1295         if success:
   1296             return data

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1145, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1143 if len(failed_workers) > 0:
   1144     pids_str = ', '.join(str(w.pid) for w in failed_workers)
-> 1145     raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
   1146 if isinstance(e, queue.Empty):
   1147     return (False, None)

RuntimeError: DataLoader worker (pid(s) 80108, 80111) exited unexpectedly

What value lambda2 worked for L2 Regularization?

Note: that the \(\lambda\) in the equations is the lambda2 in the code for clarity.

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Tune_lambda2_Exercise")

Now, let’s run a model with both L1 and L2 regularization terms.

# @markdown Visualize all of them together (Run Me!)

# @markdown `lambda1=0.001` and `lambda2=0.001`

args3 = {
    'test_batch_size': 1000,
    'epochs': 150,
    'lr': 5e-3,
    'momentum': 0.99,
    'device': DEVICE,
    'lambda1': 0.001,
    'lambda2': 0.001
}

# Initialize the model
set_seed(seed=SEED)
model = AnimalNet()
val_acc_l1l2reg, train_acc_l1l2reg, param_norm_l1l2reg, _ = main(args3,
                                                                 model,
                                                                 train_loader,
                                                                 val_loader,
                                                                 img_test_dataset,
                                                                 reg_function1=l1_reg,
                                                                 reg_function2=l2_reg)

plt.figure()

plt.plot(val_acc_l2reg, c='red', ls='dashed')
plt.plot(train_acc_l2reg,
         label=f"L2 regularized, $\lambda_2$={args2['lambda2']}",
         c='red', ls='solid')
plt.axhline(y=max(val_acc_l2reg), c='red', ls='dashed')

plt.plot(val_acc_l1reg, c='green', ls = 'dashed')
plt.plot(train_acc_l1reg,
         label=f"L1 regularized, $\lambda_1$={args1['lambda1']}",
         c='green', ls='solid')
plt.axhline(y=max(val_acc_l1reg), c='green', ls='dashed')

plt.plot(val_acc_unreg, c='blue', ls = 'dashed')
plt.plot(train_acc_unreg,
         label='Unregularized', c='blue', ls='solid')
plt.axhline(y=max(val_acc_unreg), c='blue', ls='dashed')

plt.plot(val_acc_l1l2reg, c='orange', ls='dashed')
plt.plot(train_acc_l1l2reg,
         label=f"L1+L2 regularized, $\lambda_1$={args3['lambda1']}, $\lambda_2$={args3['lambda2']}",
         c='orange', ls='solid')
plt.axhline(y=max(val_acc_l1l2reg), c='orange', ls = 'dashed')

plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.legend()
plt.show()
Random seed 2021 has been set.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1132, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1131 try:
-> 1132     data = self._data_queue.get(timeout=timeout)
   1133     return (True, data)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/queues.py:113, in Queue.get(self, block, timeout)
    112 timeout = deadline - time.monotonic()
--> 113 if not self._poll(timeout):
    114     raise Empty

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:257, in _ConnectionBase.poll(self, timeout)
    256 self._check_readable()
--> 257 return self._poll(timeout)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:424, in Connection._poll(self, timeout)
    423 def _poll(self, timeout):
--> 424     r = wait([self], timeout)
    425     return bool(r)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:931, in wait(object_list, timeout)
    930 while True:
--> 931     ready = selector.select(timeout)
    932     if ready:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/selectors.py:416, in _PollLikeSelector.select(self, timeout)
    415 try:
--> 416     fd_event_list = self._selector.poll(timeout)
    417 except InterruptedError:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/_utils/signal_handling.py:66, in _set_SIGCHLD_handler.<locals>.handler(signum, frame)
     63 def handler(signum, frame):
     64     # This following call uses `waitid` with WNOHANG from C side. Therefore,
     65     # Python can still get and update the process status successfully.
---> 66     _error_if_any_worker_fails()
     67     if previous_handler is not None:

RuntimeError: DataLoader worker (pid 80113) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[28], line 18
     16 set_seed(seed=SEED)
     17 model = AnimalNet()
---> 18 val_acc_l1l2reg, train_acc_l1l2reg, param_norm_l1l2reg, _ = main(args3,
     19                                                                  model,
     20                                                                  train_loader,
     21                                                                  val_loader,
     22                                                                  img_test_dataset,
     23                                                                  reg_function1=l1_reg,
     24                                                                  reg_function2=l2_reg)
     26 plt.figure()
     28 plt.plot(val_acc_l2reg, c='red', ls='dashed')

Cell In[11], line 252, in main(args, model, train_loader, val_loader, test_data, reg_function1, reg_function2, criterion)
    250 val_acc_list, train_acc_list,param_norm_list = [], [], []
    251 for epoch in tqdm(range(args['epochs'])):
--> 252   trained_model = train(args, model, train_loader, optimizer, epoch,
    253                         reg_function1=reg_function1,
    254                         reg_function2=reg_function2)
    255   train_acc = test(trained_model, train_loader, loader='Train', device=device)
    256   val_acc = test(trained_model, val_loader, loader='Val', device=device)

Cell In[11], line 159, in train(args, model, train_loader, optimizer, epoch, reg_function1, reg_function2, criterion)
    157 device = args['device']
    158 model.train()
--> 159 for batch_idx, (data, target) in enumerate(train_loader):
    160   data, target = data.to(device), target.to(device)
    161   optimizer.zero_grad()

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:633, in _BaseDataLoaderIter.__next__(self)
    630 if self._sampler_iter is None:
    631     # TODO(https://github.com/pytorch/pytorch/issues/76750)
    632     self._reset()  # type: ignore[call-arg]
--> 633 data = self._next_data()
    634 self._num_yielded += 1
    635 if self._dataset_kind == _DatasetKind.Iterable and \
    636         self._IterableDataset_len_called is not None and \
    637         self._num_yielded > self._IterableDataset_len_called:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1328, in _MultiProcessingDataLoaderIter._next_data(self)
   1325     return self._process_data(data)
   1327 assert not self._shutdown and self._tasks_outstanding > 0
-> 1328 idx, data = self._get_data()
   1329 self._tasks_outstanding -= 1
   1330 if self._dataset_kind == _DatasetKind.Iterable:
   1331     # Check for _IterableDatasetStopIteration

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1294, in _MultiProcessingDataLoaderIter._get_data(self)
   1290     # In this case, `self._data_queue` is a `queue.Queue`,. But we don't
   1291     # need to call `.task_done()` because we don't use `.join()`.
   1292 else:
   1293     while True:
-> 1294         success, data = self._try_get_data()
   1295         if success:
   1296             return data

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1145, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1143 if len(failed_workers) > 0:
   1144     pids_str = ', '.join(str(w.pid) for w in failed_workers)
-> 1145     raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
   1146 if isinstance(e, queue.Empty):
   1147     return (False, None)

RuntimeError: DataLoader worker (pid(s) 80113, 80116) exited unexpectedly

Now, let’s visualize what these different regularizations do to the model’s parameters. We observe the effect by computing the size (technically, the Frobenius norm).

x =  param_norm_unreg[0]
print(x)

Visualize Norm of the Models (Train Me!)

# @markdown #### Visualize Norm of the Models (Train Me!)
plt.figure()
plt.plot([i.cpu().numpy() for i in param_norm_unreg],
         label='Unregularized', c='blue')
plt.plot([i.cpu().numpy() for i in param_norm_l1reg],
         label='L1 Regularized', c='green')
plt.plot([i.cpu().numpy() for i in param_norm_l2reg],
         label='L2 Regularized', c='red')
plt.plot([i.cpu().numpy() for i in param_norm_l1l2reg],
         label='L1+L2 Regularized', c='orange')
plt.xlabel('Epoch')
plt.ylabel('Parameter Norms')
plt.legend()
plt.show()

In the above plots, you should have seen that the validation accuracies fluctuate even after the model achieves 100% train accuracy. Thus, the model is still trying to learn something. Why would this be the case?


Section 2: Dropout

Time estimate: ~25 mins

Video 2: Dropout

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Dropout_Video")

With Dropout, we literally drop out (zero out) some neurons during training. Throughout the training, the standard dropout zeros out some fraction (usually 50%) of the nodes in each layer, and on each iteration, before calculating the subsequent layer. Randomly selecting different subsets to drop out introduces noise into the process and reduces overfitting.


Dropout

Now let’s revisit the toy dataset we generated above to visualize how the Dropout stabilizes training on a noisy dataset. We will slightly modify the architecture we used above to add dropout layers.

class NetDropout(nn.Module):
  """
  Network Class - 2D with the following structure:
  nn.Linear(1, 300) + leaky_relu(self.dropout1(self.fc1(x))) # First fully connected layer with 0.4 dropout
  nn.Linear(300, 500) + leaky_relu(self.dropout2(self.fc2(x))) # Second fully connected layer with 0.2 dropout
  nn.Linear(500, 1) # Final fully connected layer
  """

  def __init__(self):
    """
    Initialize parameters of NetDropout

    Args:
      None

    Returns:
      Nothing
    """
    super(NetDropout, self).__init__()

    self.fc1 = nn.Linear(1, 300)
    self.fc2 = nn.Linear(300, 500)
    self.fc3 = nn.Linear(500, 1)
    # We add two dropout layers
    self.dropout1 = nn.Dropout(0.4)
    self.dropout2 = nn.Dropout(0.2)

  def forward(self, x):
    """
    Forward pass of NetDropout

    Args:
      x: torch.tensor
        Input features

    Returns:
      output: torch.tensor
        Output/Predictions
    """
    x = F.leaky_relu(self.dropout1(self.fc1(x)))
    x = F.leaky_relu(self.dropout2(self.fc2(x)))
    output = self.fc3(x)
    return output

Run to train the default network

# @markdown #### Run to train the default network
set_seed(seed=SEED)

# Creating train data
X = torch.rand((10, 1))
X.sort(dim = 0)
Y = 2*X + 2*torch.empty((X.shape[0], 1)).normal_(mean=0, std=1)  # adding small error in the data

X = X.unsqueeze_(1)
Y = Y.unsqueeze_(1)

# Creating test dataset
X_test = torch.linspace(0, 1, 40)
X_test = X_test.reshape((40, 1, 1))

# Train the network on toy dataset
model = Net()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-4)
max_epochs = 10000
iters = 0

running_predictions = np.empty((40, (int)(max_epochs/500 + 1)))

train_loss = []
test_loss = []
model_norm = []

for epoch in tqdm(range(max_epochs)):

  # Training
  model_norm.append(calculate_frobenius_norm(model))
  model.train()
  optimizer.zero_grad()
  predictions = model(X)
  loss = criterion(predictions,Y)
  loss.backward()
  optimizer.step()

  train_loss.append(loss.data)
  model.eval()
  Y_test = model(X_test)
  loss = criterion(Y_test, 2*X_test)
  test_loss.append(loss.data)

  if (epoch % 500 == 0 or epoch == max_epochs - 1):
    running_predictions[:, iters] = Y_test[:, 0, 0].detach().numpy()
    iters += 1
Random seed 2021 has been set.
# Train the network on toy dataset

# Initialize the model
set_seed(seed=SEED)
model = NetDropout()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-4)
max_epochs = 10000
iters = 0

running_predictions_dp = np.empty((40, (int)(max_epochs / 500)))

train_loss_dp = []
test_loss_dp = []
model_norm_dp = []

for epoch in tqdm(range(max_epochs)):

  # Training
  model_norm_dp.append(calculate_frobenius_norm(model))
  model.train()
  optimizer.zero_grad()
  predictions = model(X)
  loss = criterion(predictions, Y)
  loss.backward()
  optimizer.step()

  train_loss_dp.append(loss.data)
  model.eval()
  Y_test = model(X_test)
  loss = criterion(Y_test, 2*X_test)
  test_loss_dp.append(loss.data)

  if (epoch % 500 == 0 or epoch == max_epochs):
    running_predictions_dp[:, iters] = Y_test[:, 0, 0].detach().numpy()
    iters += 1
Random seed 2021 has been set.

Now that we have finished the training, let’s see how the model has evolved over the training process.

Animation! (Run Me!)

# @markdown Animation! (Run Me!)
set_seed(seed=SEED)

fig = plt.figure(figsize=(8, 6))
ax = plt.axes()

def frame(i):
  ax.clear()
  ax.scatter(X[:, 0, :].numpy(), Y[:, 0, :].numpy())
  plot = ax.plot(X_test[:, 0, :].detach().numpy(),
                 running_predictions_dp[:, i])
  title = f"Epoch: {i*500}"
  plt.title(title)
  ax.set_xlabel("X axis")
  ax.set_ylabel("Y axis")
  return plot


anim = animation.FuncAnimation(fig, frame, frames=range(20),
                               blit=False, repeat=False,
                               repeat_delay=10000)
html_anim = HTML(anim.to_html5_video());
plt.close()
display(html_anim)
Random seed 2021 has been set.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[36], line 22
     16   return plot
     19 anim = animation.FuncAnimation(fig, frame, frames=range(20),
     20                                blit=False, repeat=False,
     21                                repeat_delay=10000)
---> 22 html_anim = HTML(anim.to_html5_video());
     23 plt.close()
     24 display(html_anim)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/matplotlib/animation.py:1284, in Animation.to_html5_video(self, embed_limit)
   1281 path = Path(tmpdir, "temp.m4v")
   1282 # We create a writer manually so that we can get the
   1283 # appropriate size for the tag
-> 1284 Writer = writers[mpl.rcParams['animation.writer']]
   1285 writer = Writer(codec='h264',
   1286                 bitrate=mpl.rcParams['animation.bitrate'],
   1287                 fps=1000. / self._interval)
   1288 self.save(str(path), writer=writer)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/matplotlib/animation.py:148, in MovieWriterRegistry.__getitem__(self, name)
    146 if self.is_available(name):
    147     return self._registered[name]
--> 148 raise RuntimeError(f"Requested MovieWriter ({name}) not available")

RuntimeError: Requested MovieWriter (ffmpeg) not available
../../../_images/W2D1_Tutorial2_90_2.png

Plot the train and test losses with epoch

# @markdown Plot the train and test losses with epoch

plt.figure()
plt.plot(test_loss_dp, label='Test loss dropout', c='blue', ls='dashed')
plt.plot(test_loss, label='Test loss', c='red', ls='dashed')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.title('Dropout vs Without dropout')
plt.legend()
plt.show()
../../../_images/W2D1_Tutorial2_92_0.png

Plot the train and test losses with epoch

# @markdown Plot the train and test losses with epoch

plt.figure()
plt.plot(train_loss_dp, label='Train loss dropout', c='blue', ls='dashed')
plt.plot(train_loss, label='Train loss', c='red', ls='dashed')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.title('Dropout vs Without dropout')
plt.legend()
plt.show()
../../../_images/W2D1_Tutorial2_94_0.png

Plot model weights with epoch

# @markdown Plot model weights with epoch
plt.figure()
plt.plot(model_norm_dp, label='Dropout')
plt.plot(model_norm, label='No dropout')
plt.ylabel('Norm of the model')
plt.xlabel('Epochs')
plt.legend()
plt.title('Size of the model vs Epochs')
plt.show()
../../../_images/W2D1_Tutorial2_96_0.png

Think 2.1!: Dropout

Do you think this (with dropout) performed better than the initial model (without dropout)?

Click for solution

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Dropout_Discussion")

Section 2.1: Dropout Implementation Caveats

  • Dropout is used only during training. However, the complete model weights are used during testing, so it is vital to use the model.eval() method before testing the model.

  • Dropout reduces the capacity of the model during training, and hence as a general practice, wider networks are used when using dropout. If you are using a dropout with a random probability of 0.5, you might want to double the number of hidden neurons in that layer.

Now, let’s see how dropout fares on the “Animal Faces” dataset. We first modify the existing model to include dropout and then train it.

class AnimalNetDropout(nn.Module):
  """
  Network Class - Animal Faces with following structure
  nn.Linear(3*32*32, 248) + leaky_relu(self.dropout1(self.fc1(x))) # First fully connected layer with 0.5 dropout
  nn.Linear(248, 210) + leaky_relu(self.dropout2(self.fc2(x))) # Second fully connected layer with 0.3 dropout
  nn.Linear(210, 3) # Final fully connected layer
  """

  def __init__(self):
    """
    Initialize parameters of AnimalNetDropout

    Args:
      None

    Returns:
      Nothing
    """
    super(AnimalNetDropout, self).__init__()
    self.fc1 = nn.Linear(3*32*32, 248)
    self.fc2 = nn.Linear(248, 210)
    self.fc3 = nn.Linear(210, 3)
    self.dropout1 = nn.Dropout(p=0.5)
    self.dropout2 = nn.Dropout(p=0.3)

  def forward(self, x):
    """
    Forward pass of AnimalNetDropout

    Args:
      x: torch.tensor
        Input features

    Returns:
      x: torch.tensor
        Output/Predictions
    """
    x = x.view(x.shape[0], -1)
    x = F.leaky_relu(self.dropout1(self.fc1(x)))
    x = F.leaky_relu(self.dropout2(self.fc2(x)))
    x = self.fc3(x)
    output = F.log_softmax(x, dim=1)
    return output
# Set the arguments
args = {
    'test_batch_size': 1000,
    'epochs': 200,
    'lr': 5e-3,
    'batch_size': 32,
    'momentum': 0.9,
    'device': DEVICE,
    'log_interval': 100
}

# Initialize the model
set_seed(seed=SEED)
model = AnimalNetDropout()

# Train the model with Dropout
val_acc_dropout, train_acc_dropout, _, model_dp = main(args,
                                                       model,
                                                       train_loader,
                                                       val_loader,
                                                       img_test_dataset)

# Initialize the BigAnimalNet model
set_seed(seed=SEED)
model = BigAnimalNet()

# Train the model
val_acc_big, train_acc_big, _, model_big = main(args,
                                                model,
                                                train_loader,
                                                val_loader,
                                                img_test_dataset)


# Train and Test accuracy plot
plt.figure()
plt.plot(val_acc_big, label='Val - Big', c='blue', ls='dashed')
plt.plot(train_acc_big, label='Train - Big', c='blue', ls='solid')
plt.plot(val_acc_dropout, label='Val - DP', c='magenta', ls='dashed')
plt.plot(train_acc_dropout, label='Train - DP', c='magenta', ls='solid')
plt.title('Dropout')
plt.ylabel('Accuracy (%)')
plt.xlabel('Epoch')
plt.legend()
plt.show()
Random seed 2021 has been set.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1132, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1131 try:
-> 1132     data = self._data_queue.get(timeout=timeout)
   1133     return (True, data)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/queues.py:113, in Queue.get(self, block, timeout)
    112 timeout = deadline - time.monotonic()
--> 113 if not self._poll(timeout):
    114     raise Empty

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:257, in _ConnectionBase.poll(self, timeout)
    256 self._check_readable()
--> 257 return self._poll(timeout)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:424, in Connection._poll(self, timeout)
    423 def _poll(self, timeout):
--> 424     r = wait([self], timeout)
    425     return bool(r)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:931, in wait(object_list, timeout)
    930 while True:
--> 931     ready = selector.select(timeout)
    932     if ready:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/selectors.py:416, in _PollLikeSelector.select(self, timeout)
    415 try:
--> 416     fd_event_list = self._selector.poll(timeout)
    417 except InterruptedError:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/_utils/signal_handling.py:66, in _set_SIGCHLD_handler.<locals>.handler(signum, frame)
     63 def handler(signum, frame):
     64     # This following call uses `waitid` with WNOHANG from C side. Therefore,
     65     # Python can still get and update the process status successfully.
---> 66     _error_if_any_worker_fails()
     67     if previous_handler is not None:

RuntimeError: DataLoader worker (pid 80155) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[42], line 17
     14 model = AnimalNetDropout()
     16 # Train the model with Dropout
---> 17 val_acc_dropout, train_acc_dropout, _, model_dp = main(args,
     18                                                        model,
     19                                                        train_loader,
     20                                                        val_loader,
     21                                                        img_test_dataset)
     23 # Initialize the BigAnimalNet model
     24 set_seed(seed=SEED)

Cell In[11], line 252, in main(args, model, train_loader, val_loader, test_data, reg_function1, reg_function2, criterion)
    250 val_acc_list, train_acc_list,param_norm_list = [], [], []
    251 for epoch in tqdm(range(args['epochs'])):
--> 252   trained_model = train(args, model, train_loader, optimizer, epoch,
    253                         reg_function1=reg_function1,
    254                         reg_function2=reg_function2)
    255   train_acc = test(trained_model, train_loader, loader='Train', device=device)
    256   val_acc = test(trained_model, val_loader, loader='Val', device=device)

Cell In[11], line 159, in train(args, model, train_loader, optimizer, epoch, reg_function1, reg_function2, criterion)
    157 device = args['device']
    158 model.train()
--> 159 for batch_idx, (data, target) in enumerate(train_loader):
    160   data, target = data.to(device), target.to(device)
    161   optimizer.zero_grad()

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:633, in _BaseDataLoaderIter.__next__(self)
    630 if self._sampler_iter is None:
    631     # TODO(https://github.com/pytorch/pytorch/issues/76750)
    632     self._reset()  # type: ignore[call-arg]
--> 633 data = self._next_data()
    634 self._num_yielded += 1
    635 if self._dataset_kind == _DatasetKind.Iterable and \
    636         self._IterableDataset_len_called is not None and \
    637         self._num_yielded > self._IterableDataset_len_called:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1328, in _MultiProcessingDataLoaderIter._next_data(self)
   1325     return self._process_data(data)
   1327 assert not self._shutdown and self._tasks_outstanding > 0
-> 1328 idx, data = self._get_data()
   1329 self._tasks_outstanding -= 1
   1330 if self._dataset_kind == _DatasetKind.Iterable:
   1331     # Check for _IterableDatasetStopIteration

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1294, in _MultiProcessingDataLoaderIter._get_data(self)
   1290     # In this case, `self._data_queue` is a `queue.Queue`,. But we don't
   1291     # need to call `.task_done()` because we don't use `.join()`.
   1292 else:
   1293     while True:
-> 1294         success, data = self._try_get_data()
   1295         if success:
   1296             return data

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1145, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1143 if len(failed_workers) > 0:
   1144     pids_str = ', '.join(str(w.pid) for w in failed_workers)
-> 1145     raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
   1146 if isinstance(e, queue.Empty):
   1147     return (False, None)

RuntimeError: DataLoader worker (pid(s) 80155, 80157) exited unexpectedly

Think 2.2! Dropout caveats

When do you think dropouts can perform bad and do you think their placement within a model matters?

Click for solution

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Dropout_Caveats_Discussion")

Section 3: Data Augmentation

Time estimate: ~15 mins

Video 3: Data Augmentation

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Data_Augmentation_Video")

Data augmentation is often used to increase the number of training samples. Now we will explore the effects of data augmentation on regularization. Here regularization is achieved by adding noise into training data after every epoch.

PyTorch’s torchvision module provides a few built-in data augmentation techniques, which we can use on image datasets. Some of the techniques we most frequently use are:

  • Random Crop

  • Random Rotate

  • Vertical Flip

  • Horizontal Flip

Data Loader without Data Augmentation

# @markdown ####  Data Loader without Data Augmentation

# For reproducibility
g_seed = torch.Generator()
g_seed.manual_seed(SEED)


train_transform = transforms.Compose([
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
     ])
data_path = pathlib.Path('.')/'afhq' # Using pathlib to be compatible with all OS's
img_dataset = ImageFolder(data_path/'train', transform=train_transform)

# Splitting dataset
img_train_data, img_val_data,_ = torch.utils.data.random_split(img_dataset, [250,100,14280])

# Creating train_loader and Val_loader
train_loader = torch.utils.data.DataLoader(img_train_data,
                                           batch_size=batch_size,
                                           num_workers=2,
                                           worker_init_fn=seed_worker,
                                           generator=g_seed)
val_loader = torch.utils.data.DataLoader(img_val_data,
                                         batch_size=1000,
                                         num_workers=2,
                                         worker_init_fn=seed_worker,
                                         generator=g_seed)

Define a DataLoader using torchvision.transforms, which randomly augments the data for us. For more info, see here.

# Data Augmentation using transforms
new_transforms = transforms.Compose([
                                     transforms.RandomHorizontalFlip(p=0.1),
                                     transforms.RandomVerticalFlip(p=0.1),
                                     transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5),
                                                          (0.5, 0.5, 0.5))
                                     ])

data_path = pathlib.Path('.')/'afhq'  # Using pathlib to be compatible with all OS's
img_dataset = ImageFolder(data_path/'train', transform=new_transforms)
# Splitting dataset
new_train_data, _,_ = torch.utils.data.random_split(img_dataset,
                                                    [250, 100, 14280])

# For reproducibility
g_seed = torch.Generator()
g_seed.manual_seed(SEED)

# Creating train_loader and Val_loader
new_train_loader = torch.utils.data.DataLoader(new_train_data,
                                               batch_size=batch_size,
                                               worker_init_fn=seed_worker,
                                               generator=g_seed)
# Set the arguments
args = {
    'epochs': 250,
    'lr': 1e-3,
    'momentum': 0.99,
    'device': DEVICE,
}

# Initialize the model
set_seed(seed=SEED)
model_aug = AnimalNet()

# Train the model
val_acc_dataaug, train_acc_dataaug, param_norm_dataaug, _ = main(args,
                                                                 model_aug,
                                                                 new_train_loader,
                                                                 val_loader,
                                                                 img_test_dataset)
# Initialize the model
set_seed(seed=SEED)
model_pure = AnimalNet()

val_acc_pure, train_acc_pure, param_norm_pure, _, = main(args,
                                                         model_pure,
                                                         train_loader,
                                                         val_loader,
                                                         img_test_dataset)


# Train and Test accuracy plot
plt.figure()
plt.plot(val_acc_pure, label='Val Accuracy Pure',
         c='red', ls='dashed')
plt.plot(train_acc_pure, label='Train Accuracy Pure',
         c='red', ls='solid')
plt.plot(val_acc_dataaug, label='Val Accuracy data augment',
         c='blue', ls='dashed')
plt.plot(train_acc_dataaug, label='Train Accuracy data augment',
         c='blue', ls='solid')
plt.axhline(y=max(val_acc_pure), c='red', ls='dashed')
plt.axhline(y=max(val_acc_dataaug), c='blue', ls='dashed')
plt.title('Data Augmentation')
plt.ylabel('Accuracy (%)')
plt.xlabel('Epoch')
plt.legend()
plt.show()
Random seed 2021 has been set.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1132, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1131 try:
-> 1132     data = self._data_queue.get(timeout=timeout)
   1133     return (True, data)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/queues.py:113, in Queue.get(self, block, timeout)
    112 timeout = deadline - time.monotonic()
--> 113 if not self._poll(timeout):
    114     raise Empty

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:257, in _ConnectionBase.poll(self, timeout)
    256 self._check_readable()
--> 257 return self._poll(timeout)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:424, in Connection._poll(self, timeout)
    423 def _poll(self, timeout):
--> 424     r = wait([self], timeout)
    425     return bool(r)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:931, in wait(object_list, timeout)
    930 while True:
--> 931     ready = selector.select(timeout)
    932     if ready:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/selectors.py:416, in _PollLikeSelector.select(self, timeout)
    415 try:
--> 416     fd_event_list = self._selector.poll(timeout)
    417 except InterruptedError:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/_utils/signal_handling.py:66, in _set_SIGCHLD_handler.<locals>.handler(signum, frame)
     63 def handler(signum, frame):
     64     # This following call uses `waitid` with WNOHANG from C side. Therefore,
     65     # Python can still get and update the process status successfully.
---> 66     _error_if_any_worker_fails()
     67     if previous_handler is not None:

RuntimeError: DataLoader worker (pid 80160) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[48], line 14
     11 model_aug = AnimalNet()
     13 # Train the model
---> 14 val_acc_dataaug, train_acc_dataaug, param_norm_dataaug, _ = main(args,
     15                                                                  model_aug,
     16                                                                  new_train_loader,
     17                                                                  val_loader,
     18                                                                  img_test_dataset)
     19 # Initialize the model
     20 set_seed(seed=SEED)

Cell In[11], line 256, in main(args, model, train_loader, val_loader, test_data, reg_function1, reg_function2, criterion)
    252 trained_model = train(args, model, train_loader, optimizer, epoch,
    253                       reg_function1=reg_function1,
    254                       reg_function2=reg_function2)
    255 train_acc = test(trained_model, train_loader, loader='Train', device=device)
--> 256 val_acc = test(trained_model, val_loader, loader='Val', device=device)
    257 param_norm = calculate_frobenius_norm(trained_model)
    258 train_acc_list.append(train_acc)

Cell In[11], line 204, in test(model, test_loader, loader, criterion, device)
    202 correct = 0
    203 with torch.no_grad():
--> 204   for data, target in test_loader:
    205     data, target = data.to(device), target.to(device)
    206     output = model(data)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:633, in _BaseDataLoaderIter.__next__(self)
    630 if self._sampler_iter is None:
    631     # TODO(https://github.com/pytorch/pytorch/issues/76750)
    632     self._reset()  # type: ignore[call-arg]
--> 633 data = self._next_data()
    634 self._num_yielded += 1
    635 if self._dataset_kind == _DatasetKind.Iterable and \
    636         self._IterableDataset_len_called is not None and \
    637         self._num_yielded > self._IterableDataset_len_called:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1328, in _MultiProcessingDataLoaderIter._next_data(self)
   1325     return self._process_data(data)
   1327 assert not self._shutdown and self._tasks_outstanding > 0
-> 1328 idx, data = self._get_data()
   1329 self._tasks_outstanding -= 1
   1330 if self._dataset_kind == _DatasetKind.Iterable:
   1331     # Check for _IterableDatasetStopIteration

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1294, in _MultiProcessingDataLoaderIter._get_data(self)
   1290     # In this case, `self._data_queue` is a `queue.Queue`,. But we don't
   1291     # need to call `.task_done()` because we don't use `.join()`.
   1292 else:
   1293     while True:
-> 1294         success, data = self._try_get_data()
   1295         if success:
   1296             return data

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1145, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1143 if len(failed_workers) > 0:
   1144     pids_str = ', '.join(str(w.pid) for w in failed_workers)
-> 1145     raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
   1146 if isinstance(e, queue.Empty):
   1147     return (False, None)

RuntimeError: DataLoader worker (pid(s) 80160, 80161) exited unexpectedly
# Plot together: without and with augmentation
plt.figure()
plt.plot([i.cpu().numpy().item() for i in param_norm_pure],
         c='red', label='Without Augmentation')
plt.plot([i.cpu().numpy().item() for i in param_norm_dataaug],
         c='blue', label='With Augmentation')
plt.title('Norm of parameters as a function of training epoch')
plt.xlabel('Epoch')
plt.ylabel('Norm of model parameters')
plt.legend()
plt.show()

Think 3.1!: Data Augmentation

Can you think of more ways of augmenting the training data? (Think of other problems beyond object recognition.)

Click for solution

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Data_Augmentation_Discussuion")

Think! 3.2!: Overparameterized vs. Small NN

Why is it better to regularize an overparameterized ANN than to start with a smaller one? Think about the regularization methods you know. Each group should have a 10 min discussion.

Click for solution

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Overparameterized_vs_Small_NN_Discussuion")

Section 4: Stochastic Gradient Descent

Time estimate: ~20 mins

Video 4: SGD

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_SGD_Video")

Section 4.1: Learning Rate

In this section, we will see how the learning rate can act as a regularizer while training a neural network. In summary:

  • Smaller learning rates regularize less and slowly converge to deep minima.

  • Larger learning rates regularize more by missing local minima and converging to broader, flatter minima, which often generalize better.

But beware, a very large learning rate may result in overshooting or finding a bad local minimum.

In the block below, we will train the AnimalNet model with different learning rates and see how that affects the regularization.

Generating Data Loaders

# @markdown #### Generating Data Loaders

# For reproducibility
g_seed = torch.Generator()
g_seed.manual_seed(SEED)

batch_size = 128
train_transform = transforms.Compose([
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
     ])

data_path = pathlib.Path('.')/'afhq' # Using pathlib to be compatible with all OS's
img_dataset = ImageFolder(data_path/'train', transform=train_transform)
img_train_data, img_val_data, = torch.utils.data.random_split(img_dataset, [11700,2930])

full_train_loader = torch.utils.data.DataLoader(img_train_data,
                                                batch_size=batch_size,
                                                num_workers=2,
                                                worker_init_fn=seed_worker,
                                                generator=g_seed)
full_val_loader = torch.utils.data.DataLoader(img_val_data,
                                              batch_size=1000,
                                              num_workers=2,
                                              worker_init_fn=seed_worker,
                                              generator=g_seed)

test_transform = transforms.Compose([
     transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
     ])
img_test_dataset = ImageFolder(data_path/'val', transform=test_transform)
# With dataloaders: img_test_loader = DataLoader(img_test_dataset, batch_size=batch_size,shuffle=False, num_workers=1)
classes = ('cat', 'dog', 'wild')
# Set the arguments
args = {
    'test_batch_size': 1000,
    'epochs': 20,
    'batch_size': 32,
    'momentum': 0.99,
    'device': DEVICE
}

learning_rates = [5e-4, 1e-3, 5e-3]
acc_dict = {}

for i, lr in enumerate(learning_rates):
  # Initialize the model
  set_seed(seed=SEED)
  model = AnimalNet()
  # Learning rate
  args['lr'] = lr
  # Train the model
  val_acc, train_acc, param_norm, _ = main(args,
                                           model,
                                           train_loader,
                                           val_loader,
                                           img_test_dataset)
  # Store the outputs
  acc_dict[f'val_{i}'] = val_acc
  acc_dict[f'train_{i}'] = train_acc
  acc_dict[f'param_norm_{i}'] = param_norm
Random seed 2021 has been set.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1132, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1131 try:
-> 1132     data = self._data_queue.get(timeout=timeout)
   1133     return (True, data)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/queues.py:113, in Queue.get(self, block, timeout)
    112 timeout = deadline - time.monotonic()
--> 113 if not self._poll(timeout):
    114     raise Empty

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:257, in _ConnectionBase.poll(self, timeout)
    256 self._check_readable()
--> 257 return self._poll(timeout)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:424, in Connection._poll(self, timeout)
    423 def _poll(self, timeout):
--> 424     r = wait([self], timeout)
    425     return bool(r)

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/multiprocessing/connection.py:931, in wait(object_list, timeout)
    930 while True:
--> 931     ready = selector.select(timeout)
    932     if ready:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/selectors.py:416, in _PollLikeSelector.select(self, timeout)
    415 try:
--> 416     fd_event_list = self._selector.poll(timeout)
    417 except InterruptedError:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/_utils/signal_handling.py:66, in _set_SIGCHLD_handler.<locals>.handler(signum, frame)
     63 def handler(signum, frame):
     64     # This following call uses `waitid` with WNOHANG from C side. Therefore,
     65     # Python can still get and update the process status successfully.
---> 66     _error_if_any_worker_fails()
     67     if previous_handler is not None:

RuntimeError: DataLoader worker (pid 80164) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[55], line 20
     18 args['lr'] = lr
     19 # Train the model
---> 20 val_acc, train_acc, param_norm, _ = main(args,
     21                                          model,
     22                                          train_loader,
     23                                          val_loader,
     24                                          img_test_dataset)
     25 # Store the outputs
     26 acc_dict[f'val_{i}'] = val_acc

Cell In[11], line 252, in main(args, model, train_loader, val_loader, test_data, reg_function1, reg_function2, criterion)
    250 val_acc_list, train_acc_list,param_norm_list = [], [], []
    251 for epoch in tqdm(range(args['epochs'])):
--> 252   trained_model = train(args, model, train_loader, optimizer, epoch,
    253                         reg_function1=reg_function1,
    254                         reg_function2=reg_function2)
    255   train_acc = test(trained_model, train_loader, loader='Train', device=device)
    256   val_acc = test(trained_model, val_loader, loader='Val', device=device)

Cell In[11], line 159, in train(args, model, train_loader, optimizer, epoch, reg_function1, reg_function2, criterion)
    157 device = args['device']
    158 model.train()
--> 159 for batch_idx, (data, target) in enumerate(train_loader):
    160   data, target = data.to(device), target.to(device)
    161   optimizer.zero_grad()

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:633, in _BaseDataLoaderIter.__next__(self)
    630 if self._sampler_iter is None:
    631     # TODO(https://github.com/pytorch/pytorch/issues/76750)
    632     self._reset()  # type: ignore[call-arg]
--> 633 data = self._next_data()
    634 self._num_yielded += 1
    635 if self._dataset_kind == _DatasetKind.Iterable and \
    636         self._IterableDataset_len_called is not None and \
    637         self._num_yielded > self._IterableDataset_len_called:

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1328, in _MultiProcessingDataLoaderIter._next_data(self)
   1325     return self._process_data(data)
   1327 assert not self._shutdown and self._tasks_outstanding > 0
-> 1328 idx, data = self._get_data()
   1329 self._tasks_outstanding -= 1
   1330 if self._dataset_kind == _DatasetKind.Iterable:
   1331     # Check for _IterableDatasetStopIteration

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1294, in _MultiProcessingDataLoaderIter._get_data(self)
   1290     # In this case, `self._data_queue` is a `queue.Queue`,. But we don't
   1291     # need to call `.task_done()` because we don't use `.join()`.
   1292 else:
   1293     while True:
-> 1294         success, data = self._try_get_data()
   1295         if success:
   1296             return data

File ~/opt/anaconda3/envs/nma-course/lib/python3.9/site-packages/torch/utils/data/dataloader.py:1145, in _MultiProcessingDataLoaderIter._try_get_data(self, timeout)
   1143 if len(failed_workers) > 0:
   1144     pids_str = ', '.join(str(w.pid) for w in failed_workers)
-> 1145     raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
   1146 if isinstance(e, queue.Empty):
   1147     return (False, None)

RuntimeError: DataLoader worker (pid(s) 80164, 80165) exited unexpectedly

Plot Train and Validation accuracy (Run me)

# @markdown Plot Train and Validation accuracy (Run me)
plt.figure()
for i, lr in enumerate(learning_rates):
  plt.plot(acc_dict[f'val_{i}'], linestyle='dashed',
          label=f'lr={lr:0.1e} - validation')
  plt.plot(acc_dict[f'train_{i}'], label=f'{lr:0.1e} - train')

  print(f"Maximum Test Accuracy obtained with lr={lr:0.1e}: {max(acc_dict[f'val_{i}'])}")

plt.title('Optimal Learning Rate')
plt.ylabel('Accuracy (%)')
plt.xlabel('Epoch')
plt.legend()
plt.show()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[56], line 4
      2 plt.figure()
      3 for i, lr in enumerate(learning_rates):
----> 4   plt.plot(acc_dict[f'val_{i}'], linestyle='dashed',
      5           label=f'lr={lr:0.1e} - validation')
      6   plt.plot(acc_dict[f'train_{i}'], label=f'{lr:0.1e} - train')
      8   print(f"Maximum Test Accuracy obtained with lr={lr:0.1e}: {max(acc_dict[f'val_{i}'])}")

KeyError: 'val_0'
<Figure size 800x600 with 0 Axes>

Plot parametric norms (Run me)

# @markdown Plot parametric norms (Run me)
plt.figure()
for i, lr in enumerate(learning_rates):
  plt.plot([i.cpu().numpy().item() for i in acc_dict[f'param_norm_{i}']],
           label=f'lr={lr:0.2e}')
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('Parameter norms')
plt.show()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[57], line 4
      2 plt.figure()
      3 for i, lr in enumerate(learning_rates):
----> 4   plt.plot([i.cpu().numpy().item() for i in acc_dict[f'param_norm_{i}']],
      5            label=f'lr={lr:0.2e}')
      6 plt.legend()
      7 plt.xlabel('Epoch')

KeyError: 'param_norm_0'
<Figure size 800x600 with 0 Axes>

In the model above, we observe something different from what we expected. Why do you think this is happening?


Section 5: Hyperparameter Tuning

Time estimate: ~5 mins

Video 5: Hyperparameter tuning

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Hyperparameter_tuning_Video")

Hyperparameter tuning is often tricky and time-consuming, and it is a vital part of training any Deep Learning model to give good generalization. There are a few techniques that we can use to guide us during the search.

  • Grid Search: Try all possible combinations of hyperparameters

  • Random Search: Randomly try different combinations of hyperparameters

  • Coordinate-wise Gradient Descent: Start at one set of hyperparameters and try changing one at a time, accept any changes that reduce your validation error

  • Bayesian Optimization / Auto ML: Start from a set of hyperparameters that have worked well on a similar problem, and then do some sort of local exploration (e.g., gradient descent) from there.

There are many choices, like what range to explore over, which parameter to optimize first, etc. Some hyperparameters don’t matter much (people use a dropout of either 0.5 or 0.2, but not much else). Others can matter a lot more (e.g., size and depth of the neural net). The key is to see what worked on similar problems.

One can automate the process of tuning the network architecture using the so called Neural Architecture Search (NAS). NAS designs new architectures using a few building blocks (Linear, Convolutional, Convolution Layers, etc.) and optimizes the design based on performance using a wide range of techniques such as Grid Search, Reinforcement Learning, Gradient Descent, Evolutionary Algorithms, etc. This obviously requires very high computing power. Read this article to learn more about NAS.

Think! 5: Overview of regularization techniques

Which regularization technique today do you think had the most significant effect on the network? Why might do you think so? Can you apply all of the regularization methods on the same network?

Click for solution

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Overview_of_regularization_techniques_Discussion")

Summary

Congratulations! You have finished the first week of NMA-DL!

In this tutorial, you learned more regularization techniques, i.e., L1 and L2 regularization, Dropout, and Data Augmentation. Finally, you have seen that the learning rate of SGD can act as a regularizer. An interesting paper can be found here.

Continue to the Bonus material on Adversarial Attacks if you have time left!


Daily survey

Don’t forget to complete your reflections and content check in the daily survey! Please be patient after logging in as there is a small delay before you will be redirected to the survey.

button link to survey


Bonus: Adversarial Attacks

Time estimate: ~15 mins

Video 6: Adversarial Attacks

Submit your feedback

# @title Submit your feedback
content_review(f"{feedback_prefix}_Adversarial_Attacks_Bonus_Video")

Designing perturbations to the input data to trick a machine learning model is called an “adversarial attack”. These attacks are an inevitable consequence of learning in high dimensional space using complex decision boundaries. Depending on the application, these attacks can be very dangerous.


https://raw.githubusercontent.com/NeuromatchAcademy/course-content-dl/main/tutorials/static/AdversarialAttacks_w1d5t2.png

Hence, we need to build models which can defend against such attacks. One possible way to do it is by regularizing the networks, which smooths the decision boundaries. A few ways of building models robust to such attacks are:

  • Defensive Distillation: Models trained via distillation are less prone to such attacks as they are trained on soft labels as there is an element of randomness in the training process.

  • Feature Squeezing: Identifies adversarial attacks for online classifiers whose model is being used by comparing the model’s prediction before and after squeezing the input.

  • SGD: You can also pick weight to minimize what the adversary is trying to maximize via SGD.


Read more about adversarial attacks here.