Lecture 12 - Pytorch datasets¶

ECE364 - Programming Methods for Machine Learning¶

Nickvash Kani¶

Slides based off prior lectures by Alex Schwing, Aigou Han, Farzad Kamalabadi, Corey Snyder. All mistakes are my own!¶

Things covered in today's lecture:

  • Training vs. validation vs. testing sets.
  • PyTorch Dataset class
  • PyTorch Dataloader class
  • Rewriting past models using torch.NN

Experimental Setup for Machine Learning Problems¶

The purpose of any machine learning model is to apply the trained model on new, unseen data. In many cases, a machine learning model may be able to perform close to perfect, i.e. nearly 100% classification accuracy, on the data it is trained on. However, we need some way to evaluate the ability for the model to generalize to new data. The most common approach to training and testing a model is to partition a dataset into a training set, validation set, and a test set.

For a dataset $\mathcal{D}=\{(x_i, y_i)\}_{i=1}^{N}$, each of these sets are defined as:

  • Training set: $\mathcal{D}_{\textrm{train}}=\{(x_i, y_i)\}_{i=1}^{N_\textrm{train}}$ is the collection of data which we train the data on.
  • Validation set: $\mathcal{D}_{\textrm{val}}=\{(x_i, y_i)\}_{i=1}^{N_\textrm{val}}$ is the collection of data which the model does not train on, but we use to evaluate how the model generalizes to new data. We then use the validation set to tune any hyperparameters of the model or learning algorithm, e.g. learning rate, how long we train, choice of weight decay, etc.
  • Test set: $\mathcal{D}_{\textrm{test}}=\{(x_i, y_i)\}_{i=1}^{N_\textrm{test}}$ is the collection of data which the model does not train on and which we do not use to modify hyperparameters. Thus, the test set is the final evaluation of model generalization and should be the primary method for comparing model performance.

Note that each of these sets are disjoint and thus share no data points. The size of each partition is a choice that may depend on the application, but in general we usually reserve at least half of the data for training and roughly equal amounts of the remainder for validation and testing.

PyTorch Datasets¶

PyTorch offers an abstract base class for creating datasets that simplifies the process of building, manipulating, and sampling datasets, e.g. into training, validation, and testing sets. The torch.utils.data.Dataset class requires any new class that inherits this base class to implement three methods:

  • __init__: The __init__ method is the constructor for the new dataset. Unlike the nn.Module class, the base class constructor does not need to be called, i.e. we do not need to call super().__init__(). The constructor is most commonly used to establish the data for the dataset or the necessary information to assign attributes that will assist the data retrieval process in the __getitem__ method.

  • __len__: The __len__ method overrides the len() function in Python to determine the length of the dataset. In other words, for a dataset named my_dataset. The implemented __len__ function will allow len(my_dataset) to return the length of the dataset.

  • __getitem__: The __getitem__ method overloads the use of brackets to index items in a dataset. For example, a dataset named my_dataset will call the __getitem__ method when we use my_dataset[i] and the index i is an input to the __getitem__ method.

Let's take a look at an example dataset by implementing the toy dataset from the previous lecture.

In [34]:
import torch
import matplotlib.pyplot as plt
from torch.utils.data import Dataset

class TwoClassDataset(Dataset):
    # don't forget the self identifier!
    def __init__(self, N, sigma):
        self.N = N # number of data points per class
        self.sigma = sigma # standard deviation of each class cluster
        self.plus_class = self.sigma*torch.randn(N, 2) + torch.tensor([-2, 2])
        self.negative_class = self.sigma*torch.randn(N, 2) + torch.tensor([2, -2])
        self.data = torch.cat((self.plus_class, self.negative_class), dim=0)
        self.labels = torch.cat((torch.ones(self.N), torch.zeros(self.N)))

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        x = self.data[idx]
        y = self.labels[idx]
        return x, y # return input and output pair

N = 300
sigma = 2.2
dataset = TwoClassDataset(N, sigma)

plus_data = dataset.plus_class
negative_data = dataset.negative_class
print('Dataset has {} points'.format(len(dataset)))
idx = 2
x, y = dataset[idx]
print('Dataset point with index {} is at x={} and label y={}'.format(idx, x, y))
plt.figure(figsize=(8, 6))
plt.scatter(plus_data[:, 0].numpy(), plus_data[:, 1].numpy(), color='tomato', s=50, edgecolor='black')
plt.scatter(negative_data[:, 0].numpy(), negative_data[:, 1].numpy(), color='cornflowerblue', s=50, edgecolor='black')
plt.tight_layout()
Dataset has 600 points
Dataset point with index 2 is at x=tensor([-6.1094,  2.8263]) and label y=1.0

Aside from making custom datasets, PyTorch and torchvision have many pre-loaded datasets implemented within the same Dataset interface.

PyTorch Dataloaders¶

With a PyTorch Dataset class in hand, we may take advantage of the torch.utils.data.DataLoader interface that will simplify the process of sampling batches of data; shuffling the dataset; partitioning into training set, validation set, testing set; and more! A DataLoader does not need to be implemented like a Dataset or nn.Module class. Instead, we only need to provide a Dataset object as input alongside several optional inputs:

  • batch_size: number of examples in each batch or call to the dataloader

  • shuffle: Boolean option to shuffle dataset each pass or epoch through the dataset

  • sampler: Sampler object that specifies how data will be extracted from the dataset. For example, the SubsetRandomSampler allows us to specify indices within the larger dataset to sample at random. This is an easy way to create training, validation, and testing sets!

  • Plenty other options that may be explored here

In [51]:
import numpy as np
from torch.utils.data import DataLoader
from torch.utils.data import SubsetRandomSampler

# create indices for each split of dataset
N_train = 400
N_val = 100
N_test = 100
indices = np.arange(len(dataset))
np.random.shuffle(indices)
train_indices = indices[:N_train]
val_indices = indices[N_train:N_train+N_val]
test_indices = indices[N_train+N_val:]

# create dataloader for each split
batch_size = 8
train_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(train_indices))
val_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(val_indices))
test_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(test_indices))

# data loaders are iterable
for x_batch, y_batch in val_loader:
    print(x_batch, y_batch)
tensor([ 8.1181, -5.4555,  8.6386, -8.9990,  1.2713,  8.6987, -3.4535, -4.1942]) tensor([  22.5567,   18.3818,   48.1455, -123.8909,    6.4792,   77.3273,
          22.5393,   26.1579])
tensor([ 7.1572,  3.8138, -0.8108,  2.5726, -6.0160, -5.7958,  6.9369, -3.1331]) tensor([16.8736, -8.8492, 26.6519, -4.0923,  4.6798, 10.3545, 11.7133, 54.4291])
tensor([ 8.6186, -8.4585,  6.4164, -0.3704, -9.8398, -5.4354, -7.4174, -2.1922]) tensor([  70.3666,  -76.5585,    0.8527,   27.8508, -138.4358,   19.2604,
         -42.5211,   37.6921])
tensor([-7.1772,  4.2142, -1.2312, -0.0300, -2.6126,  2.6126,  6.4965,  3.4735]) tensor([-36.6202, -10.6731,  33.0731,  15.4096,  26.9181,  -5.8009,   2.8783,
         -6.1611])
tensor([ 6.1361, -7.8178,  6.2362, -4.7147, -3.3934,  8.7988, -1.1912, -2.1522]) tensor([ -2.5395, -51.8111,  -1.5073,  28.3777,  46.3091,  73.0414,  32.9882,
         30.0916])
tensor([ 6.5365,  1.3113, -9.9199,  8.1782,  6.8969, -2.2723, -4.6747, -1.9920]) tensor([   3.4006,    5.7700, -253.6331,   46.7379,    9.9292,   24.6629,
          20.2900,   21.5221])
tensor([ 9.3393,  4.6947,  5.3754,  8.8388, -7.5375, -3.0731, -9.7798, -7.0370]) tensor([  79.9111,  -11.9098,  -11.2705,   62.2441,  -40.3343,   32.2237,
        -173.6702,  -22.4568])
tensor([ 7.0370, -6.1361, -4.0541, -3.3734,  0.6106,  1.3714, -4.5946, -4.3143]) tensor([11.9574,  4.8337, 47.1679, 29.5674, 14.4052,  6.0316, 36.4204, 32.7617])
tensor([-4.5345, -7.3574,  4.8148, -8.2983,  9.2392, -1.8719, -0.4705,  4.3744]) tensor([ 27.9780, -31.4805, -10.9058, -57.3738,  75.5984,  32.5530,  32.0110,
        -16.1253])
tensor([ 4.9149, -5.1552,  3.8939,  7.4775, -6.1762,  0.7508,  9.4394, -3.7337]) tensor([-13.9664,  19.5922, -13.3650,  21.0319,   5.1110,  13.4428,  94.7806,
         34.9134])
tensor([ 1.8519,  4.8348, -6.0761,  2.5325, -2.1121, -8.7187,  1.1111,  8.8188]) tensor([  1.2009, -13.8936,   7.6205,  -5.7885,  52.6507, -55.8902,   9.1953,
         77.4420])
tensor([ 0.0701,  8.2182, -9.8198,  3.0531, -0.9910, -8.6186, -5.9760, -8.6386]) tensor([  21.5072,   34.2962, -103.9888,  -10.6829,   24.7766,  -78.1586,
           8.5793,  -78.0394])
tensor([0.6707, 3.0931, 9.3594, 8.1582]) tensor([12.1186, -7.4889, 96.0098, 36.3790])

The New and Improved Training Loop¶

Now, let's combine these datasets and dataloaders to further simplify the training loop we used to perform the toy logistic regression problem in the previous lecture.

In [52]:
# code from previous lecture
import torch.nn as nn

class LogisticRegression(nn.Module):
    def __init__(self, N):
        super().__init__()
        self.w = nn.Parameter(torch.ones(N))
        self.b = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        return 1/(1+torch.exp(-(self.w@x+self.b)))

# compute classification accuracy
def model_accuracy(model, input_data, labels):
    predictions = model(input_data.unsqueeze(-1)).squeeze(-1)
    positive_preds = predictions >= 0.5
    negative_preds = predictions < 0.5
    n_correct = torch.sum(positive_preds*labels)+torch.sum(negative_preds*(1-labels))
    return n_correct
In [53]:
import numpy as np
from torch.utils.data import DataLoader
from torch.utils.data import SubsetRandomSampler

# # create indices for each split of dataset
# N_train = 200
# N_val = 50
# N_test = 50
indices = np.arange(len(dataset))
np.random.shuffle(indices)
train_indices = indices[:N_train]
val_indices = indices[N_train:N_train+N_val]
test_indices = indices[N_train+N_val:]

# create dataloader for each split
batch_size = 16
train_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(train_indices))
val_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(val_indices))
test_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(test_indices))

# training setup
criterion = nn.BCELoss(reduction='mean') # binary cross-entropy loss, use mean loss
lr = 1e-3 # learning rate
logreg_model = LogisticRegression(2) # initialize model
optimizer = torch.optim.SGD(logreg_model.parameters(), lr=lr, momentum=0.99, weight_decay=1e-3) # initialize optimizer

n_epoch = 20 # number of passes through the training dataset
loss_values, train_accuracies, val_accuracies = [], [], []
for n in range(n_epoch):
    epoch_loss, epoch_acc = 0, 0
    for x_batch, y_batch in train_loader:
        # zero out gradients
        optimizer.zero_grad()
        # pass batch to model
        predictions = logreg_model(x_batch.unsqueeze(-1)).squeeze(-1) # make dimensions match for loss function
        # calculate loss
        loss = criterion(predictions, y_batch)
        # backpropagate and update
        loss.backward()
        optimizer.step()
        # logging
        epoch_loss += loss.item()
        epoch_acc += model_accuracy(logreg_model, x_batch, y_batch)
    loss_values.append(epoch_loss/len(train_loader))
    train_accuracies.append(epoch_acc/N_train)
    # validation performance
    val_acc = 0
    for x_batch, y_batch in val_loader:
        # don't compute gradients since we are only evaluating the model
        with torch.no_grad():
            val_acc += model_accuracy(logreg_model, x_batch, y_batch)
    val_accuracies.append(val_acc/N_val)

plt.figure(figsize=(12,6))
plt.subplot(131)
plt.semilogy(loss_values)
plt.grid(True)
plt.title('Loss values')
plt.xlabel('Epoch')
plt.subplot(132)
plt.plot(train_accuracies)
plt.grid(True)
plt.title('Training Accuracy')
plt.xlabel('Epoch')
plt.subplot(133)
plt.plot(val_accuracies)
plt.grid(True)
plt.title('Validation Accuracy')
plt.xlabel('Epoch')
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[53], line 35
     33 optimizer.zero_grad()
     34 # pass batch to model
---> 35 predictions = logreg_model(x_batch.unsqueeze(-1)).squeeze(-1) # make dimensions match for loss function
     36 # calculate loss
     37 loss = criterion(predictions, y_batch)

File ~/opt/anaconda3/envs/ece364_oldnb/lib/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
   1530     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1531 else:
-> 1532     return self._call_impl(*args, **kwargs)

File ~/opt/anaconda3/envs/ece364_oldnb/lib/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
   1536 # If we don't have any hooks, we want to skip the rest of the logic in
   1537 # this function, and just call forward.
   1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1539         or _global_backward_pre_hooks or _global_backward_hooks
   1540         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541     return forward_call(*args, **kwargs)
   1543 try:
   1544     result = None

Cell In[52], line 11, in LogisticRegression.forward(self, x)
     10 def forward(self, x):
---> 11     return 1/(1+torch.exp(-(self.w@x+self.b)))

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x2 and 16x1)

Multi-class Logistic Regression¶

Recall from our lecture on multi-class logistic regression that we may perform multi-class classification using logistic regression where each class has its own weight vector and bias term. More formally, for class $k$, we have weight vector $w_k\in\mathbb{R}^n$ and bias $b_k\in\mathbb{R}$. Thus, an input $x\in\mathbb{R}^n$ receives a "score" $z_k$ for class $k$ via $$ z_k = w_k^\top x + b_k. $$

Larger scores should correspond to larger probabilities for a particular class while smaller (possibly negative) scores give smaller probabilities. For a collection of scores $z=\{z_1, z_2, \ldots, z_M\}$ across $M$ classes, we can use the softmax function to normalize these scores to a probability distribution.

$$ \textrm{softmax}(z)_k=\mathbf{Pr}\{\textrm{Class }y=k|x\} = \frac{e^{z_k}}{\sum_{j=1}^{M}e^{z_j}}. $$

Instead of computing each score one-by-one, we can put all of our parameters into a weight matrix $A$ with a bias vector $b$. Thus,

$$ \begin{align} z &= Ax+b\\ &= \begin{bmatrix} \rule[.6ex]{4ex}{0.75pt} & w_1^\top & \rule[.6ex]{4ex}{0.75pt}\\ \rule[.6ex]{4ex}{0.75pt} & w_2^\top & \rule[.6ex]{4ex}{0.75pt}\\ & \vdots & \\ \rule[.6ex]{4ex}{0.75pt} & w_M^\top & \rule[.6ex]{4ex}{0.75pt}\\ \end{bmatrix}\begin{bmatrix} \rule[-1ex]{0.5pt}{4ex}\\ x\\ \rule[1ex]{0.5pt}{4ex}\\ \end{bmatrix} +\begin{bmatrix} b_1\\ b_2\\ \vdots\\ b_M \end{bmatrix}\\ &= \begin{bmatrix} z_1\\ z_2\\ \vdots\\ z_M \end{bmatrix} \end{align} $$

In PyTorch, we can efficiently implement the multi-class logistic regression model using the nn.Linear class which implements parameter matrices including bias terms. For the below implementation, we will also not apply the softmax function ourselves since the nn.CrossEntropyLoss class expects logits or scores instead of the final probabilities.

In [54]:
class MulticlassLogisticRegression(nn.Module):
    def __init__(self, N, M):
        super().__init__()
        self.N = N # input dimension
        self.M = M # number of classes
        self.weight_matrix = nn.Linear(N, M, bias=True) # N input dimensions, M output dimensions

    def forward(self, x):
        return self.weight_matrix(x)

And that's it! Again, we could compute the softmax of these logits/scores but the PyTorch implementation of cross-entropy loss asks for logits instead of probabilities. Finally, recall for model $f_\theta(x)=z$ that cross-entropy loss is given by $$ \ell_{ce}(f_\theta(x), y) = -\log\left(\textrm{softmax}(f_\theta(x))_y\right)= -\log\left(\frac{e^{z_y}}{\sum_{j=1}^{M}e^{z_j}}\right) $$ for input $x$ with label $y$ (assume class number $y$ also identifies the appropriate index in $z$, for simplicity).

Example: Toy Multi-class Logistic Regression¶

Below, we provide a toy dataset for generating a toy 4-class dataset with $N=50$ samples per class and $\sigma=0.6$.

In [55]:
import torch

from torch.utils.data import Dataset, DataLoader, SubsetRandomSampler

class FourClassDataset(Dataset):
    def __init__(self, N, sigma):
        self.N = N # number of data points per class
        self.sigma = sigma # standard deviation of each class cluster
        self.class_zero = self.sigma*torch.randn(N, 2) + torch.tensor([1, 1])
        self.class_one = self.sigma*torch.randn(N, 2) + torch.tensor([-1, 1])
        self.class_two = self.sigma*torch.randn(N, 2) + torch.tensor([-1, -1])
        self.class_three = self.sigma*torch.randn(N, 2) + torch.tensor([1, -1])
        self.data = torch.cat((self.class_zero, self.class_one, self.class_two, self.class_three), dim=0)
        self.labels = torch.cat((torch.zeros(self.N), torch.ones(self.N),
                                 2*torch.ones(self.N), 3*torch.ones(self.N))).long()

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        x = self.data[idx]
        y = self.labels[idx]
        return x, y # return input and output pair

# visualize dataset
N = 100
sigma = 0.6
dataset = FourClassDataset(N, sigma)
class_zero = dataset.class_zero
class_one = dataset.class_one
class_two = dataset.class_two
class_three = dataset.class_three
plt.figure(figsize=(8, 6))
plt.scatter(class_zero[:, 0].numpy(), class_zero[:, 1].numpy(), color='tomato', s=50, edgecolor='black', label='Class 0')
plt.scatter(class_one[:, 0].numpy(), class_one[:, 1].numpy(), color='cornflowerblue', s=50, edgecolor='black', label='Class 1')
plt.scatter(class_two[:, 0].numpy(), class_two[:, 1].numpy(), color='seagreen', s=50, edgecolor='black', label='Class 2')
plt.scatter(class_three[:, 0].numpy(), class_three[:, 1].numpy(), color='violet', s=50, edgecolor='black', label='Class 3')
plt.grid(True)
plt.legend()
plt.tight_layout()

a) Create training, validation, and testing dataloaders with a 60%:20%:20% training:validation:testing split and batch size 16.

b) Fill in the training loop for training the MulticlassLogisticRegression model using the nn.CrossEntropyLoss function. We have provided the helper function for tracking model accuracy and comments below to help. Note: we do not need to worry about the squeezing and unsqueezing of the last dimension when passing data to the model now that we are using the nn.Linear class for our parameters.

c) Experiment with training parameters, e.g. learning rate, number of epochs, batch size, momentum, weight decay, and plot the training loss, training accuracy, and validation accuracy.

d) Observe the performance of this multi-class logistic regression model with noisier clusters, i.e. bigger values of $\sigma$.

In [56]:
def multiclass_model_accuracy(model, input_data, labels):
    predictions = model(input_data) # no need to squeeze/unsqueeze dimensions now!
    predicted_classes = torch.argmax(predictions, dim=1) # find highest scoring class along the columns
    n_correct = torch.sum(torch.eq(predicted_classes, labels))
    return n_correct
    
# Part a) create DataLoaders
N_train = 240
N_val = 80
N_test = 80
batch_size = 16
indices = np.arange(len(dataset))
np.random.shuffle(indices)
train_indices = indices[:N_train]
val_indices = indices[N_train:N_train+N_val]
test_indices = indices[N_train+N_val:]
train_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(train_indices))
val_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(val_indices))
test_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(test_indices))

# Part b) training loop
# initialize MulticlassLogisticRegression model
N = 2
M = 4
model = MulticlassLogisticRegression(N, M)

# initialize loss function and optimizer
criterion = nn.CrossEntropyLoss()
lr = 1e-4
optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=0.99)

# logging info
loss_values, train_accuracies, val_accuracies = [], [], []
n_epoch = 300 # set this value
for n in range(n_epoch):
    epoch_loss, epoch_acc = 0, 0
    for x_batch, y_batch in train_loader:
        # zero out gradients
        optimizer.zero_grad()
        # pass batch to model, no need to worry about using squeeze/unsqueeze now
        predictions = model(x_batch)
        # calculate loss
        loss = criterion(predictions, y_batch)
        # backpropagate and update
        loss.backward() # backprop
        optimizer.step()
        # logging to update epoch_loss (add loss value) and epoch_acc (add current batch accuracy)
        epoch_loss += loss.item()
        epoch_acc += multiclass_model_accuracy(model, x_batch, y_batch)

    loss_values.append(epoch_loss/len(train_loader))
    train_accuracies.append(epoch_acc/N_train)
    # validation performance
    val_acc = 0
    for x_batch, y_batch in val_loader:
        # don't compute gradients since we are only evaluating the model
        with torch.no_grad():
            # validation batch accuracy
            val_acc += multiclass_model_accuracy(model, x_batch, y_batch)
    val_accuracies.append(val_acc/N_val)

plt.figure(figsize=(12,6))
plt.subplot(131)
plt.semilogy(loss_values)
plt.grid(True)
plt.title('Loss values')
plt.xlabel('Epoch')
plt.subplot(132)
plt.plot(train_accuracies)
plt.grid(True)
plt.title('Training Accuracy')
plt.xlabel('Epoch')
plt.subplot(133)
plt.plot(val_accuracies)
plt.grid(True)
plt.title('Validation Accuracy')
plt.xlabel('Epoch')
Out[56]:
Text(0.5, 0, 'Epoch')
In [58]:
# visualize dataset + model predictions in background

# put model in eval mode
model.eval()

# create a dense grid over the input space
x_min, x_max = dataset.data[:,0].min()-1, dataset.data[:,0].max()+1
y_min, y_max = dataset.data[:,1].min()-1, dataset.data[:,1].max()+1

xx, yy = torch.meshgrid(
    torch.linspace(x_min, x_max, 300),
    torch.linspace(y_min, y_max, 300),
    indexing='ij'
)

# flatten grid to pass through model
grid_points = torch.stack([xx.reshape(-1), yy.reshape(-1)], dim=1)

# get predictions over entire grid
with torch.no_grad():
    logits = model(grid_points)
    preds = torch.argmax(logits, dim=1)

# reshape predictions back to grid shape
Z = preds.reshape(xx.shape)

# convert to numpy for plotting
Z = Z.numpy()
xx = xx.numpy()
yy = yy.numpy()

plt.figure(figsize=(8,6))

# plot decision regions
plt.contourf(xx, yy, Z, alpha=0.3, levels=4, cmap='Pastel1')

# overlay original data
plt.scatter(dataset.class_zero[:, 0].numpy(), dataset.class_zero[:, 1].numpy(),
            color='tomato', s=50, edgecolor='black', label='Class 0')

plt.scatter(dataset.class_one[:, 0].numpy(), dataset.class_one[:, 1].numpy(),
            color='seagreen', s=50, edgecolor='black', label='Class 1')

plt.scatter(dataset.class_two[:, 0].numpy(), dataset.class_two[:, 1].numpy(),
            color='violet', s=50, edgecolor='black', label='Class 2')

plt.scatter(dataset.class_three[:, 0].numpy(), dataset.class_three[:, 1].numpy(),
            color='cornflowerblue', s=50, edgecolor='black', label='Class 3')

plt.grid(True)
plt.legend()
plt.title("Learned Decision Regions")
plt.tight_layout()
plt.show()

Note: Why do we have to reset the gradient?¶

Remember in the our previous codes, we had to constantly reset the gradient?

  • w_gd.grad = 0
  • optimizer.zero_grad()

Why doesn't the gradient get zero'ed everytime we calculate it?


It is because we might have to calculate the gradient in batches. Datasets can be huge and most of the time, we can store the entire dataset in memory. Accummulating the gradient so that we update the gradient every couple of batches can help stabalize the training.

This is something you go tot just play around with. Lots of stategies, lots of published results either way, no firm rules on how you shoudl train a model, all depends on the dataset.

Example: Polynomial Regression Model¶

Below, we provide a toy code for doing the third degree polynomial regression modle we have seen in previous lectures.

In [60]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torch.utils.data import SubsetRandomSampler

class ThirdOrderPolynomial(nn.Module):
    def __init__(self):
        '''
        Specify the learnable parameters: a, b, c, d
        '''
        super().__init__() # call nn.Module constructor first
        self.a = nn.Parameter(torch.rand(1))
        self.b = nn.Parameter(torch.rand(1))
        self.c = nn.Parameter(torch.rand(1))
        self.d = nn.Parameter(torch.rand(1))

    def forward(self, x):
        '''
        Implement f(x).
        '''
        f_x = self.a*x**3 + self.b*x**2 + self.c*x + self.d
        return f_x

class ThirdOrderPolynomialDataset(Dataset):
    # don't forget the self identifier!
    def __init__(self, N, sigma):
        self.N = N # number of data points per class
        self.sigma = sigma # standard deviation of each class cluster
        self.x = torch.linspace(-10.0,10.0,self.N)
        self.t = self.f(self.x)*(1+self.sigma*torch.randn(N))

    def f(self, x): 
        return 0.25*x**3 + -0.5*x**2 + -10*x + 20 

    def __len__(self):
        return len(self.x)

    def __getitem__(self, idx):
        x = self.x[idx]
        y = self.t[idx]
        return x, y # return input and output pair
 

def model_regression_loss(model, data_loader, criterion):
    reg_loss=0
    for x_batch, y_batch in data_loader:
        predictions = model(x_batch.unsqueeze(-1)).squeeze(-1)
        reg_loss += criterion(predictions, y_batch)
    return reg_loss

        
N = 1000
sigma = 0.2
dataset = ThirdOrderPolynomialDataset(N, sigma)
print(dataset.x.shape)
print(dataset.t.shape)

# create indices for each split of dataset
N_train = int(N*0.6)
N_val = int(N*0.2)
N_test = int(N*0.2)
indices = np.arange(len(dataset))
np.random.shuffle(indices)
train_indices = indices[:N_train]
val_indices = indices[N_train:N_train+N_val]
test_indices = indices[N_train+N_val:]

# create dataloader for each split
batch_size = 100
train_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(train_indices))
val_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(val_indices))
test_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(test_indices))

# training setup
criterion = nn.MSELoss(reduction='mean')
lr = 1e-6 # learning rate
model = ThirdOrderPolynomial() # initialize model
optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=0.99) # initialize optimizer


a_vals = []
b_vals = []
c_vals = []
d_vals = []
n_epoch = 1000 # number of passes through the training dataset
loss_values, train_loss, val_loss = [], [], []
for n in range(n_epoch):
    epoch_loss, epoch_acc = 0, 0
    for x_batch, y_batch in train_loader:
        optimizer.zero_grad()
        predictions = model(x_batch.unsqueeze(-1)).squeeze(-1) # make dimensions match for loss function
        loss = criterion(predictions, y_batch)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    loss_values.append(epoch_loss/len(train_loader))
    with torch.no_grad():
        train_loss.append(model_regression_loss(model, train_loader, criterion))
        val_loss.append(model_regression_loss(model, val_loader, criterion))
        a_vals.append(model.a.data.item())
        b_vals.append(model.b.data.item())
        c_vals.append(model.c.data.item())
        d_vals.append(model.d.data.item())
        
print('Final regression model:')
for name, param in model.named_parameters():
    print(f"parameter: {name}, Value: {param.data.item()}")    

xx = np.arange(-10,10,0.1)    
curr_fn = np.zeros_like(xx)
plt.figure(figsize=(12,6))
plt.subplot(131)
x, t = dataset[train_indices]
plt.scatter(x.detach().numpy(), t, color='orange')
w_vals = [d_vals[-1], c_vals[-1], b_vals[-1], a_vals[-1]]
for k in range(xx.shape[0]):
    curr_fn[k] = sum(w_vals[a] * xx[k]**a for a in np.arange(3+1))
plt.plot(xx, curr_fn, color='blue')
plt.grid(True)
plt.title('f(x)')
plt.xlabel('x')

plt.subplot(132)
plt.plot(train_loss)
plt.grid(True)
plt.title('Training Loss')
plt.xlabel('Epoch')
plt.subplot(133)
plt.plot(val_loss)
plt.grid(True)
plt.title('Validation loss')
plt.xlabel('Epoch')

hist_size = len(a_vals)
iter_num = np.array([0, int(0.01*hist_size), int(0.05*hist_size), int(0.10*hist_size), hist_size-1]).astype(int)
plt.figure(figsize=(20, 5))
for j, i in enumerate(iter_num):
    plt.subplot(1, 5, j+1)
    curr_fn = np.zeros_like(xx)
    w_vals = [d_vals[i], c_vals[i], b_vals[i], a_vals[i]]
    for k in range(xx.shape[0]):
        curr_fn[k] = sum(w_vals[a] * xx[k]**a for a in np.arange(3+1))
    plt.plot(xx, curr_fn, color='blue')
    x, t = dataset[train_indices]
    plt.scatter(x.detach().numpy(), t, color='orange')
    plt.grid(True)
    plt.title('Regressed function: Iteration {}'.format(i))
torch.Size([1000])
torch.Size([1000])
Final regression model:
parameter: a, Value: 0.25297772884368896
parameter: b, Value: -0.3366248309612274
parameter: c, Value: -10.17745304107666
parameter: d, Value: 8.852567672729492

That's it for today¶

  • Enjoy your Tuesday!
  • We'll have a review next lecture.
In [ ]: