Things covered in today's lecture:
Dataset class Dataloader classThe purpose of any machine learning model is to apply the trained model on new, unseen data. In many cases, a machine learning model may be able to perform close to perfect, i.e. nearly 100% classification accuracy, on the data it is trained on. However, we need some way to evaluate the ability for the model to generalize to new data. The most common approach to training and testing a model is to partition a dataset into a training set, validation set, and a test set.
For a dataset $\mathcal{D}=\{(x_i, y_i)\}_{i=1}^{N}$, each of these sets are defined as:
Note that each of these sets are disjoint and thus share no data points. The size of each partition is a choice that may depend on the application, but in general we usually reserve at least half of the data for training and roughly equal amounts of the remainder for validation and testing.
PyTorch offers an abstract base class for creating datasets that simplifies the process of building, manipulating, and sampling datasets, e.g. into training, validation, and testing sets. The torch.utils.data.Dataset class requires any new class that inherits this base class to implement three methods:
__init__: The __init__ method is the constructor for the new dataset. Unlike the nn.Module class, the base class constructor does not need to be called, i.e. we do not need to call super().__init__(). The constructor is most commonly used to establish the data for the dataset or the necessary information to assign attributes that will assist the data retrieval process in the __getitem__ method.
__len__: The __len__ method overrides the len() function in Python to determine the length of the dataset. In other words, for a dataset named my_dataset. The implemented __len__ function will allow len(my_dataset) to return the length of the dataset.
__getitem__: The __getitem__ method overloads the use of brackets to index items in a dataset. For example, a dataset named my_dataset will call the __getitem__ method when we use my_dataset[i] and the index i is an input to the __getitem__ method.
Let's take a look at an example dataset by implementing the toy dataset from the previous lecture.
import torch
import matplotlib.pyplot as plt
from torch.utils.data import Dataset
class TwoClassDataset(Dataset):
# don't forget the self identifier!
def __init__(self, N, sigma):
self.N = N # number of data points per class
self.sigma = sigma # standard deviation of each class cluster
self.plus_class = self.sigma*torch.randn(N, 2) + torch.tensor([-2, 2])
self.negative_class = self.sigma*torch.randn(N, 2) + torch.tensor([2, -2])
self.data = torch.cat((self.plus_class, self.negative_class), dim=0)
self.labels = torch.cat((torch.ones(self.N), torch.zeros(self.N)))
def __len__(self):
return len(self.labels)
def __getitem__(self, idx):
x = self.data[idx]
y = self.labels[idx]
return x, y # return input and output pair
N = 300
sigma = 2.2
dataset = TwoClassDataset(N, sigma)
plus_data = dataset.plus_class
negative_data = dataset.negative_class
print('Dataset has {} points'.format(len(dataset)))
idx = 2
x, y = dataset[idx]
print('Dataset point with index {} is at x={} and label y={}'.format(idx, x, y))
plt.figure(figsize=(8, 6))
plt.scatter(plus_data[:, 0].numpy(), plus_data[:, 1].numpy(), color='tomato', s=50, edgecolor='black')
plt.scatter(negative_data[:, 0].numpy(), negative_data[:, 1].numpy(), color='cornflowerblue', s=50, edgecolor='black')
plt.tight_layout()
Dataset has 600 points Dataset point with index 2 is at x=tensor([-6.1094, 2.8263]) and label y=1.0
Aside from making custom datasets, PyTorch and torchvision have many pre-loaded datasets implemented within the same Dataset interface.
With a PyTorch Dataset class in hand, we may take advantage of the torch.utils.data.DataLoader interface that will simplify the process of sampling batches of data; shuffling the dataset; partitioning into training set, validation set, testing set; and more! A DataLoader does not need to be implemented like a Dataset or nn.Module class. Instead, we only need to provide a Dataset object as input alongside several optional inputs:
batch_size: number of examples in each batch or call to the dataloader
shuffle: Boolean option to shuffle dataset each pass or epoch through the dataset
sampler: Sampler object that specifies how data will be extracted from the dataset. For example, the SubsetRandomSampler allows us to specify indices within the larger dataset to sample at random. This is an easy way to create training, validation, and testing sets!
Plenty other options that may be explored here
import numpy as np
from torch.utils.data import DataLoader
from torch.utils.data import SubsetRandomSampler
# create indices for each split of dataset
N_train = 400
N_val = 100
N_test = 100
indices = np.arange(len(dataset))
np.random.shuffle(indices)
train_indices = indices[:N_train]
val_indices = indices[N_train:N_train+N_val]
test_indices = indices[N_train+N_val:]
# create dataloader for each split
batch_size = 8
train_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(train_indices))
val_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(val_indices))
test_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(test_indices))
# data loaders are iterable
for x_batch, y_batch in val_loader:
print(x_batch, y_batch)
tensor([ 8.1181, -5.4555, 8.6386, -8.9990, 1.2713, 8.6987, -3.4535, -4.1942]) tensor([ 22.5567, 18.3818, 48.1455, -123.8909, 6.4792, 77.3273,
22.5393, 26.1579])
tensor([ 7.1572, 3.8138, -0.8108, 2.5726, -6.0160, -5.7958, 6.9369, -3.1331]) tensor([16.8736, -8.8492, 26.6519, -4.0923, 4.6798, 10.3545, 11.7133, 54.4291])
tensor([ 8.6186, -8.4585, 6.4164, -0.3704, -9.8398, -5.4354, -7.4174, -2.1922]) tensor([ 70.3666, -76.5585, 0.8527, 27.8508, -138.4358, 19.2604,
-42.5211, 37.6921])
tensor([-7.1772, 4.2142, -1.2312, -0.0300, -2.6126, 2.6126, 6.4965, 3.4735]) tensor([-36.6202, -10.6731, 33.0731, 15.4096, 26.9181, -5.8009, 2.8783,
-6.1611])
tensor([ 6.1361, -7.8178, 6.2362, -4.7147, -3.3934, 8.7988, -1.1912, -2.1522]) tensor([ -2.5395, -51.8111, -1.5073, 28.3777, 46.3091, 73.0414, 32.9882,
30.0916])
tensor([ 6.5365, 1.3113, -9.9199, 8.1782, 6.8969, -2.2723, -4.6747, -1.9920]) tensor([ 3.4006, 5.7700, -253.6331, 46.7379, 9.9292, 24.6629,
20.2900, 21.5221])
tensor([ 9.3393, 4.6947, 5.3754, 8.8388, -7.5375, -3.0731, -9.7798, -7.0370]) tensor([ 79.9111, -11.9098, -11.2705, 62.2441, -40.3343, 32.2237,
-173.6702, -22.4568])
tensor([ 7.0370, -6.1361, -4.0541, -3.3734, 0.6106, 1.3714, -4.5946, -4.3143]) tensor([11.9574, 4.8337, 47.1679, 29.5674, 14.4052, 6.0316, 36.4204, 32.7617])
tensor([-4.5345, -7.3574, 4.8148, -8.2983, 9.2392, -1.8719, -0.4705, 4.3744]) tensor([ 27.9780, -31.4805, -10.9058, -57.3738, 75.5984, 32.5530, 32.0110,
-16.1253])
tensor([ 4.9149, -5.1552, 3.8939, 7.4775, -6.1762, 0.7508, 9.4394, -3.7337]) tensor([-13.9664, 19.5922, -13.3650, 21.0319, 5.1110, 13.4428, 94.7806,
34.9134])
tensor([ 1.8519, 4.8348, -6.0761, 2.5325, -2.1121, -8.7187, 1.1111, 8.8188]) tensor([ 1.2009, -13.8936, 7.6205, -5.7885, 52.6507, -55.8902, 9.1953,
77.4420])
tensor([ 0.0701, 8.2182, -9.8198, 3.0531, -0.9910, -8.6186, -5.9760, -8.6386]) tensor([ 21.5072, 34.2962, -103.9888, -10.6829, 24.7766, -78.1586,
8.5793, -78.0394])
tensor([0.6707, 3.0931, 9.3594, 8.1582]) tensor([12.1186, -7.4889, 96.0098, 36.3790])
Now, let's combine these datasets and dataloaders to further simplify the training loop we used to perform the toy logistic regression problem in the previous lecture.
# code from previous lecture
import torch.nn as nn
class LogisticRegression(nn.Module):
def __init__(self, N):
super().__init__()
self.w = nn.Parameter(torch.ones(N))
self.b = nn.Parameter(torch.zeros(1))
def forward(self, x):
return 1/(1+torch.exp(-(self.w@x+self.b)))
# compute classification accuracy
def model_accuracy(model, input_data, labels):
predictions = model(input_data.unsqueeze(-1)).squeeze(-1)
positive_preds = predictions >= 0.5
negative_preds = predictions < 0.5
n_correct = torch.sum(positive_preds*labels)+torch.sum(negative_preds*(1-labels))
return n_correct
import numpy as np
from torch.utils.data import DataLoader
from torch.utils.data import SubsetRandomSampler
# # create indices for each split of dataset
# N_train = 200
# N_val = 50
# N_test = 50
indices = np.arange(len(dataset))
np.random.shuffle(indices)
train_indices = indices[:N_train]
val_indices = indices[N_train:N_train+N_val]
test_indices = indices[N_train+N_val:]
# create dataloader for each split
batch_size = 16
train_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(train_indices))
val_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(val_indices))
test_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(test_indices))
# training setup
criterion = nn.BCELoss(reduction='mean') # binary cross-entropy loss, use mean loss
lr = 1e-3 # learning rate
logreg_model = LogisticRegression(2) # initialize model
optimizer = torch.optim.SGD(logreg_model.parameters(), lr=lr, momentum=0.99, weight_decay=1e-3) # initialize optimizer
n_epoch = 20 # number of passes through the training dataset
loss_values, train_accuracies, val_accuracies = [], [], []
for n in range(n_epoch):
epoch_loss, epoch_acc = 0, 0
for x_batch, y_batch in train_loader:
# zero out gradients
optimizer.zero_grad()
# pass batch to model
predictions = logreg_model(x_batch.unsqueeze(-1)).squeeze(-1) # make dimensions match for loss function
# calculate loss
loss = criterion(predictions, y_batch)
# backpropagate and update
loss.backward()
optimizer.step()
# logging
epoch_loss += loss.item()
epoch_acc += model_accuracy(logreg_model, x_batch, y_batch)
loss_values.append(epoch_loss/len(train_loader))
train_accuracies.append(epoch_acc/N_train)
# validation performance
val_acc = 0
for x_batch, y_batch in val_loader:
# don't compute gradients since we are only evaluating the model
with torch.no_grad():
val_acc += model_accuracy(logreg_model, x_batch, y_batch)
val_accuracies.append(val_acc/N_val)
plt.figure(figsize=(12,6))
plt.subplot(131)
plt.semilogy(loss_values)
plt.grid(True)
plt.title('Loss values')
plt.xlabel('Epoch')
plt.subplot(132)
plt.plot(train_accuracies)
plt.grid(True)
plt.title('Training Accuracy')
plt.xlabel('Epoch')
plt.subplot(133)
plt.plot(val_accuracies)
plt.grid(True)
plt.title('Validation Accuracy')
plt.xlabel('Epoch')
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[53], line 35 33 optimizer.zero_grad() 34 # pass batch to model ---> 35 predictions = logreg_model(x_batch.unsqueeze(-1)).squeeze(-1) # make dimensions match for loss function 36 # calculate loss 37 loss = criterion(predictions, y_batch) File ~/opt/anaconda3/envs/ece364_oldnb/lib/python3.11/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs) 1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] 1531 else: -> 1532 return self._call_impl(*args, **kwargs) File ~/opt/anaconda3/envs/ece364_oldnb/lib/python3.11/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs) 1536 # If we don't have any hooks, we want to skip the rest of the logic in 1537 # this function, and just call forward. 1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1539 or _global_backward_pre_hooks or _global_backward_hooks 1540 or _global_forward_hooks or _global_forward_pre_hooks): -> 1541 return forward_call(*args, **kwargs) 1543 try: 1544 result = None Cell In[52], line 11, in LogisticRegression.forward(self, x) 10 def forward(self, x): ---> 11 return 1/(1+torch.exp(-(self.w@x+self.b))) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x2 and 16x1)
Recall from our lecture on multi-class logistic regression that we may perform multi-class classification using logistic regression where each class has its own weight vector and bias term. More formally, for class $k$, we have weight vector $w_k\in\mathbb{R}^n$ and bias $b_k\in\mathbb{R}$. Thus, an input $x\in\mathbb{R}^n$ receives a "score" $z_k$ for class $k$ via $$ z_k = w_k^\top x + b_k. $$
Larger scores should correspond to larger probabilities for a particular class while smaller (possibly negative) scores give smaller probabilities. For a collection of scores $z=\{z_1, z_2, \ldots, z_M\}$ across $M$ classes, we can use the softmax function to normalize these scores to a probability distribution.
$$ \textrm{softmax}(z)_k=\mathbf{Pr}\{\textrm{Class }y=k|x\} = \frac{e^{z_k}}{\sum_{j=1}^{M}e^{z_j}}. $$Instead of computing each score one-by-one, we can put all of our parameters into a weight matrix $A$ with a bias vector $b$. Thus,
$$ \begin{align} z &= Ax+b\\ &= \begin{bmatrix} \rule[.6ex]{4ex}{0.75pt} & w_1^\top & \rule[.6ex]{4ex}{0.75pt}\\ \rule[.6ex]{4ex}{0.75pt} & w_2^\top & \rule[.6ex]{4ex}{0.75pt}\\ & \vdots & \\ \rule[.6ex]{4ex}{0.75pt} & w_M^\top & \rule[.6ex]{4ex}{0.75pt}\\ \end{bmatrix}\begin{bmatrix} \rule[-1ex]{0.5pt}{4ex}\\ x\\ \rule[1ex]{0.5pt}{4ex}\\ \end{bmatrix} +\begin{bmatrix} b_1\\ b_2\\ \vdots\\ b_M \end{bmatrix}\\ &= \begin{bmatrix} z_1\\ z_2\\ \vdots\\ z_M \end{bmatrix} \end{align} $$In PyTorch, we can efficiently implement the multi-class logistic regression model using the nn.Linear class which implements parameter matrices including bias terms. For the below implementation, we will also not apply the softmax function ourselves since the nn.CrossEntropyLoss class expects logits or scores instead of the final probabilities.
class MulticlassLogisticRegression(nn.Module):
def __init__(self, N, M):
super().__init__()
self.N = N # input dimension
self.M = M # number of classes
self.weight_matrix = nn.Linear(N, M, bias=True) # N input dimensions, M output dimensions
def forward(self, x):
return self.weight_matrix(x)
And that's it! Again, we could compute the softmax of these logits/scores but the PyTorch implementation of cross-entropy loss asks for logits instead of probabilities. Finally, recall for model $f_\theta(x)=z$ that cross-entropy loss is given by $$ \ell_{ce}(f_\theta(x), y) = -\log\left(\textrm{softmax}(f_\theta(x))_y\right)= -\log\left(\frac{e^{z_y}}{\sum_{j=1}^{M}e^{z_j}}\right) $$ for input $x$ with label $y$ (assume class number $y$ also identifies the appropriate index in $z$, for simplicity).
Below, we provide a toy dataset for generating a toy 4-class dataset with $N=50$ samples per class and $\sigma=0.6$.
import torch
from torch.utils.data import Dataset, DataLoader, SubsetRandomSampler
class FourClassDataset(Dataset):
def __init__(self, N, sigma):
self.N = N # number of data points per class
self.sigma = sigma # standard deviation of each class cluster
self.class_zero = self.sigma*torch.randn(N, 2) + torch.tensor([1, 1])
self.class_one = self.sigma*torch.randn(N, 2) + torch.tensor([-1, 1])
self.class_two = self.sigma*torch.randn(N, 2) + torch.tensor([-1, -1])
self.class_three = self.sigma*torch.randn(N, 2) + torch.tensor([1, -1])
self.data = torch.cat((self.class_zero, self.class_one, self.class_two, self.class_three), dim=0)
self.labels = torch.cat((torch.zeros(self.N), torch.ones(self.N),
2*torch.ones(self.N), 3*torch.ones(self.N))).long()
def __len__(self):
return len(self.labels)
def __getitem__(self, idx):
x = self.data[idx]
y = self.labels[idx]
return x, y # return input and output pair
# visualize dataset
N = 100
sigma = 0.6
dataset = FourClassDataset(N, sigma)
class_zero = dataset.class_zero
class_one = dataset.class_one
class_two = dataset.class_two
class_three = dataset.class_three
plt.figure(figsize=(8, 6))
plt.scatter(class_zero[:, 0].numpy(), class_zero[:, 1].numpy(), color='tomato', s=50, edgecolor='black', label='Class 0')
plt.scatter(class_one[:, 0].numpy(), class_one[:, 1].numpy(), color='cornflowerblue', s=50, edgecolor='black', label='Class 1')
plt.scatter(class_two[:, 0].numpy(), class_two[:, 1].numpy(), color='seagreen', s=50, edgecolor='black', label='Class 2')
plt.scatter(class_three[:, 0].numpy(), class_three[:, 1].numpy(), color='violet', s=50, edgecolor='black', label='Class 3')
plt.grid(True)
plt.legend()
plt.tight_layout()
a) Create training, validation, and testing dataloaders with a 60%:20%:20% training:validation:testing split and batch size 16.
b) Fill in the training loop for training the MulticlassLogisticRegression model using the nn.CrossEntropyLoss function. We have provided the helper function for tracking model accuracy and comments below to help. Note: we do not need to worry about the squeezing and unsqueezing of the last dimension when passing data to the model now that we are using the nn.Linear class for our parameters.
c) Experiment with training parameters, e.g. learning rate, number of epochs, batch size, momentum, weight decay, and plot the training loss, training accuracy, and validation accuracy.
d) Observe the performance of this multi-class logistic regression model with noisier clusters, i.e. bigger values of $\sigma$.
def multiclass_model_accuracy(model, input_data, labels):
predictions = model(input_data) # no need to squeeze/unsqueeze dimensions now!
predicted_classes = torch.argmax(predictions, dim=1) # find highest scoring class along the columns
n_correct = torch.sum(torch.eq(predicted_classes, labels))
return n_correct
# Part a) create DataLoaders
N_train = 240
N_val = 80
N_test = 80
batch_size = 16
indices = np.arange(len(dataset))
np.random.shuffle(indices)
train_indices = indices[:N_train]
val_indices = indices[N_train:N_train+N_val]
test_indices = indices[N_train+N_val:]
train_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(train_indices))
val_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(val_indices))
test_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(test_indices))
# Part b) training loop
# initialize MulticlassLogisticRegression model
N = 2
M = 4
model = MulticlassLogisticRegression(N, M)
# initialize loss function and optimizer
criterion = nn.CrossEntropyLoss()
lr = 1e-4
optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=0.99)
# logging info
loss_values, train_accuracies, val_accuracies = [], [], []
n_epoch = 300 # set this value
for n in range(n_epoch):
epoch_loss, epoch_acc = 0, 0
for x_batch, y_batch in train_loader:
# zero out gradients
optimizer.zero_grad()
# pass batch to model, no need to worry about using squeeze/unsqueeze now
predictions = model(x_batch)
# calculate loss
loss = criterion(predictions, y_batch)
# backpropagate and update
loss.backward() # backprop
optimizer.step()
# logging to update epoch_loss (add loss value) and epoch_acc (add current batch accuracy)
epoch_loss += loss.item()
epoch_acc += multiclass_model_accuracy(model, x_batch, y_batch)
loss_values.append(epoch_loss/len(train_loader))
train_accuracies.append(epoch_acc/N_train)
# validation performance
val_acc = 0
for x_batch, y_batch in val_loader:
# don't compute gradients since we are only evaluating the model
with torch.no_grad():
# validation batch accuracy
val_acc += multiclass_model_accuracy(model, x_batch, y_batch)
val_accuracies.append(val_acc/N_val)
plt.figure(figsize=(12,6))
plt.subplot(131)
plt.semilogy(loss_values)
plt.grid(True)
plt.title('Loss values')
plt.xlabel('Epoch')
plt.subplot(132)
plt.plot(train_accuracies)
plt.grid(True)
plt.title('Training Accuracy')
plt.xlabel('Epoch')
plt.subplot(133)
plt.plot(val_accuracies)
plt.grid(True)
plt.title('Validation Accuracy')
plt.xlabel('Epoch')
Text(0.5, 0, 'Epoch')
# visualize dataset + model predictions in background
# put model in eval mode
model.eval()
# create a dense grid over the input space
x_min, x_max = dataset.data[:,0].min()-1, dataset.data[:,0].max()+1
y_min, y_max = dataset.data[:,1].min()-1, dataset.data[:,1].max()+1
xx, yy = torch.meshgrid(
torch.linspace(x_min, x_max, 300),
torch.linspace(y_min, y_max, 300),
indexing='ij'
)
# flatten grid to pass through model
grid_points = torch.stack([xx.reshape(-1), yy.reshape(-1)], dim=1)
# get predictions over entire grid
with torch.no_grad():
logits = model(grid_points)
preds = torch.argmax(logits, dim=1)
# reshape predictions back to grid shape
Z = preds.reshape(xx.shape)
# convert to numpy for plotting
Z = Z.numpy()
xx = xx.numpy()
yy = yy.numpy()
plt.figure(figsize=(8,6))
# plot decision regions
plt.contourf(xx, yy, Z, alpha=0.3, levels=4, cmap='Pastel1')
# overlay original data
plt.scatter(dataset.class_zero[:, 0].numpy(), dataset.class_zero[:, 1].numpy(),
color='tomato', s=50, edgecolor='black', label='Class 0')
plt.scatter(dataset.class_one[:, 0].numpy(), dataset.class_one[:, 1].numpy(),
color='seagreen', s=50, edgecolor='black', label='Class 1')
plt.scatter(dataset.class_two[:, 0].numpy(), dataset.class_two[:, 1].numpy(),
color='violet', s=50, edgecolor='black', label='Class 2')
plt.scatter(dataset.class_three[:, 0].numpy(), dataset.class_three[:, 1].numpy(),
color='cornflowerblue', s=50, edgecolor='black', label='Class 3')
plt.grid(True)
plt.legend()
plt.title("Learned Decision Regions")
plt.tight_layout()
plt.show()
Remember in the our previous codes, we had to constantly reset the gradient?
w_gd.grad = 0optimizer.zero_grad()Why doesn't the gradient get zero'ed everytime we calculate it?
It is because we might have to calculate the gradient in batches. Datasets can be huge and most of the time, we can store the entire dataset in memory. Accummulating the gradient so that we update the gradient every couple of batches can help stabalize the training.
This is something you go tot just play around with. Lots of stategies, lots of published results either way, no firm rules on how you shoudl train a model, all depends on the dataset.
Below, we provide a toy code for doing the third degree polynomial regression modle we have seen in previous lectures.
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torch.utils.data import SubsetRandomSampler
class ThirdOrderPolynomial(nn.Module):
def __init__(self):
'''
Specify the learnable parameters: a, b, c, d
'''
super().__init__() # call nn.Module constructor first
self.a = nn.Parameter(torch.rand(1))
self.b = nn.Parameter(torch.rand(1))
self.c = nn.Parameter(torch.rand(1))
self.d = nn.Parameter(torch.rand(1))
def forward(self, x):
'''
Implement f(x).
'''
f_x = self.a*x**3 + self.b*x**2 + self.c*x + self.d
return f_x
class ThirdOrderPolynomialDataset(Dataset):
# don't forget the self identifier!
def __init__(self, N, sigma):
self.N = N # number of data points per class
self.sigma = sigma # standard deviation of each class cluster
self.x = torch.linspace(-10.0,10.0,self.N)
self.t = self.f(self.x)*(1+self.sigma*torch.randn(N))
def f(self, x):
return 0.25*x**3 + -0.5*x**2 + -10*x + 20
def __len__(self):
return len(self.x)
def __getitem__(self, idx):
x = self.x[idx]
y = self.t[idx]
return x, y # return input and output pair
def model_regression_loss(model, data_loader, criterion):
reg_loss=0
for x_batch, y_batch in data_loader:
predictions = model(x_batch.unsqueeze(-1)).squeeze(-1)
reg_loss += criterion(predictions, y_batch)
return reg_loss
N = 1000
sigma = 0.2
dataset = ThirdOrderPolynomialDataset(N, sigma)
print(dataset.x.shape)
print(dataset.t.shape)
# create indices for each split of dataset
N_train = int(N*0.6)
N_val = int(N*0.2)
N_test = int(N*0.2)
indices = np.arange(len(dataset))
np.random.shuffle(indices)
train_indices = indices[:N_train]
val_indices = indices[N_train:N_train+N_val]
test_indices = indices[N_train+N_val:]
# create dataloader for each split
batch_size = 100
train_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(train_indices))
val_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(val_indices))
test_loader = DataLoader(dataset, batch_size=batch_size, sampler=SubsetRandomSampler(test_indices))
# training setup
criterion = nn.MSELoss(reduction='mean')
lr = 1e-6 # learning rate
model = ThirdOrderPolynomial() # initialize model
optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=0.99) # initialize optimizer
a_vals = []
b_vals = []
c_vals = []
d_vals = []
n_epoch = 1000 # number of passes through the training dataset
loss_values, train_loss, val_loss = [], [], []
for n in range(n_epoch):
epoch_loss, epoch_acc = 0, 0
for x_batch, y_batch in train_loader:
optimizer.zero_grad()
predictions = model(x_batch.unsqueeze(-1)).squeeze(-1) # make dimensions match for loss function
loss = criterion(predictions, y_batch)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
loss_values.append(epoch_loss/len(train_loader))
with torch.no_grad():
train_loss.append(model_regression_loss(model, train_loader, criterion))
val_loss.append(model_regression_loss(model, val_loader, criterion))
a_vals.append(model.a.data.item())
b_vals.append(model.b.data.item())
c_vals.append(model.c.data.item())
d_vals.append(model.d.data.item())
print('Final regression model:')
for name, param in model.named_parameters():
print(f"parameter: {name}, Value: {param.data.item()}")
xx = np.arange(-10,10,0.1)
curr_fn = np.zeros_like(xx)
plt.figure(figsize=(12,6))
plt.subplot(131)
x, t = dataset[train_indices]
plt.scatter(x.detach().numpy(), t, color='orange')
w_vals = [d_vals[-1], c_vals[-1], b_vals[-1], a_vals[-1]]
for k in range(xx.shape[0]):
curr_fn[k] = sum(w_vals[a] * xx[k]**a for a in np.arange(3+1))
plt.plot(xx, curr_fn, color='blue')
plt.grid(True)
plt.title('f(x)')
plt.xlabel('x')
plt.subplot(132)
plt.plot(train_loss)
plt.grid(True)
plt.title('Training Loss')
plt.xlabel('Epoch')
plt.subplot(133)
plt.plot(val_loss)
plt.grid(True)
plt.title('Validation loss')
plt.xlabel('Epoch')
hist_size = len(a_vals)
iter_num = np.array([0, int(0.01*hist_size), int(0.05*hist_size), int(0.10*hist_size), hist_size-1]).astype(int)
plt.figure(figsize=(20, 5))
for j, i in enumerate(iter_num):
plt.subplot(1, 5, j+1)
curr_fn = np.zeros_like(xx)
w_vals = [d_vals[i], c_vals[i], b_vals[i], a_vals[i]]
for k in range(xx.shape[0]):
curr_fn[k] = sum(w_vals[a] * xx[k]**a for a in np.arange(3+1))
plt.plot(xx, curr_fn, color='blue')
x, t = dataset[train_indices]
plt.scatter(x.detach().numpy(), t, color='orange')
plt.grid(True)
plt.title('Regressed function: Iteration {}'.format(i))
torch.Size([1000]) torch.Size([1000]) Final regression model: parameter: a, Value: 0.25297772884368896 parameter: b, Value: -0.3366248309612274 parameter: c, Value: -10.17745304107666 parameter: d, Value: 8.852567672729492