diff --git a/.gitignore b/.gitignore index 619d98e..156b785 100644 --- a/.gitignore +++ b/.gitignore @@ -21,4 +21,5 @@ yarn-error.log* /tutorials/*.md /tutorials/data +/tutorials/*.pth diff --git a/docs/01-Introduction.md b/docs/01-Introduction.md new file mode 100644 index 0000000..6e643bb --- /dev/null +++ b/docs/01-Introduction.md @@ -0,0 +1,43 @@ +**Learn the Basics** || +[Quickstart](Quickstart.html) || +[Tensors](Tensors.html) || +[Datasets & DataLoaders](Data.html) || +[Transforms](transforms_tutorial.html) || +[Build Model](buildmodel_tutorial.html) || +[Autograd](autogradqs_tutorial.html) || +[Optimization](optimization_tutorial.html) || +[Save & Load Model](saveloadrun_tutorial.html) + +# Learn the Basics + +Authors: +[Suraj Subramanian](https://github.com/suraj813), +[Seth Juarez](https://github.com/sethjuarez/), +[Cassie Breviu](https://github.com/cassieview/), +[Dmitry Soshnikov](https://soshnikov.com/), +[Ari Bornstein](https://github.com/aribornstein/) + +Most machine learning workflows involve working with data, creating models, optimizing model +parameters, and saving the trained models. This tutorial introduces you to a complete ML workflow +implemented in PyTorch, with links to learn more about each of these concepts. + +We'll use the FashionMNIST dataset to train a neural network that predicts if an input image belongs +to one of the following classes: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, +Bag, or Ankle boot. + +`This tutorial assumes a basic familiarity with Python and Deep Learning concepts.` + + +## Running the Tutorial Code +You can run this tutorial in a couple of ways: + +- **In the cloud**: This is the easiest way to get started! Each section has a "Run in Microsoft Learn" link at the top, which opens an integrated notebook in Microsoft Learn with the code in a fully-hosted environment. +- **Locally**: This option requires you to setup PyTorch and TorchVision first on your local machine ([installation instructions](https://pytorch.org/get-started/locally/)). Download the notebook or copy the code into your favorite IDE. + + +## How to Use this Guide +If you're familiar with other deep learning frameworks, check out the [0. Quickstart](quickstart_tutorial.html) first +to quickly familiarize yourself with PyTorch's API. + +If you're new to deep learning frameworks, head right into the first section of our step-by-step guide: [1. Tensors](tensor_tutorial.html). + diff --git a/docs/02-Quickstart.md b/docs/02-Quickstart.md new file mode 100644 index 0000000..c529ed9 --- /dev/null +++ b/docs/02-Quickstart.md @@ -0,0 +1,385 @@ +[Learn the Basics](Introduction.html) || +**Quickstart** || +[Tensors](Tensors.html) || +[Datasets & DataLoaders](Data.html) || +[Transforms](transforms_tutorial.html) || +[Build Model](buildmodel_tutorial.html) || +[Autograd](autogradqs_tutorial.html) || +[Optimization](optimization_tutorial.html) || +[Save & Load Model](saveloadrun_tutorial.html) + +# Quickstart +This section runs through the API for common tasks in machine learning. Refer to the links in each section to dive deeper. + +## Working with data +PyTorch has two [primitives to work with data](https://pytorch.org/docs/stable/data.html): +``torch.utils.data.DataLoader`` and ``torch.utils.data.Dataset``. +``Dataset`` stores the samples and their corresponding labels, and ``DataLoader`` wraps an iterable around +the ``Dataset``. + + + +```python +import torch +from torch import nn +from torch.utils.data import DataLoader +from torchvision import datasets +from torchvision.transforms import ToTensor +``` + +PyTorch offers domain-specific libraries such as [TorchText](https://pytorch.org/text/stable/index.html), +[TorchVision](https://pytorch.org/vision/stable/index.html), and [TorchAudio](https://pytorch.org/audio/stable/index.html), +all of which include datasets. For this tutorial, we will be using a TorchVision dataset. + +The ``torchvision.datasets`` module contains ``Dataset`` objects for many real-world vision data like +CIFAR, COCO ([full list here](https://pytorch.org/vision/stable/datasets.html)). In this tutorial, we +use the FashionMNIST dataset. Every TorchVision ``Dataset`` includes two arguments: ``transform`` and +``target_transform`` to modify the samples and labels respectively. + + + + +```python +# Download training data from open datasets. +training_data = datasets.FashionMNIST( + root="data", + train=True, + download=True, + transform=ToTensor(), +) + +# Download test data from open datasets. +test_data = datasets.FashionMNIST( + root="data", + train=False, + download=True, + transform=ToTensor(), +) +``` + +We pass the ``Dataset`` as an argument to ``DataLoader``. This wraps an iterable over our dataset, and supports +automatic batching, sampling, shuffling and multiprocess data loading. Here we define a batch size of 64, i.e. each element +in the dataloader iterable will return a batch of 64 features and labels. + + + + +```python +batch_size = 64 + +# Create data loaders. +train_dataloader = DataLoader(training_data, batch_size=batch_size) +test_dataloader = DataLoader(test_data, batch_size=batch_size) + +for X, y in test_dataloader: + print(f"Shape of X [N, C, H, W]: {X.shape}") + print(f"Shape of y: {y.shape} {y.dtype}") + break +``` + + Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28]) + Shape of y: torch.Size([64]) torch.int64 + + +Read more about [loading data in PyTorch](data_tutorial.html). + + + + +-------------- + + + + +## Creating Models +To define a neural network in PyTorch, we create a class that inherits +from [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). We define the layers of the network +in the ``__init__`` function and specify how data will pass through the network in the ``forward`` function. To accelerate +operations in the neural network, we move it to the GPU if available. + + + + +```python +# Get cpu or gpu device for training. +device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu" +print(f"Using {device} device") + +# Define model +class NeuralNetwork(nn.Module): + def __init__(self): + super().__init__() + self.flatten = nn.Flatten() + self.linear_relu_stack = nn.Sequential( + nn.Linear(28*28, 512), + nn.ReLU(), + nn.Linear(512, 512), + nn.ReLU(), + nn.Linear(512, 10) + ) + + def forward(self, x): + x = self.flatten(x) + logits = self.linear_relu_stack(x) + return logits + +model = NeuralNetwork().to(device) +print(model) +``` + + Using mps device + NeuralNetwork( + (flatten): Flatten(start_dim=1, end_dim=-1) + (linear_relu_stack): Sequential( + (0): Linear(in_features=784, out_features=512, bias=True) + (1): ReLU() + (2): Linear(in_features=512, out_features=512, bias=True) + (3): ReLU() + (4): Linear(in_features=512, out_features=10, bias=True) + ) + ) + + +Read more about [building neural networks in PyTorch](buildmodel_tutorial.html). + + + + +-------------- + + + + +## Optimizing the Model Parameters +To train a model, we need a [loss function](https://pytorch.org/docs/stable/nn.html#loss-functions) +and an [optimizer](https://pytorch.org/docs/stable/optim.html). + + + + +```python +loss_fn = nn.CrossEntropyLoss() +optimizer = torch.optim.SGD(model.parameters(), lr=1e-3) +``` + +In a single training loop, the model makes predictions on the training dataset (fed to it in batches), and +backpropagates the prediction error to adjust the model's parameters. + + + + +```python +def train(dataloader, model, loss_fn, optimizer): + size = len(dataloader.dataset) + model.train() + for batch, (X, y) in enumerate(dataloader): + X, y = X.to(device), y.to(device) + + # Compute prediction error + pred = model(X) + loss = loss_fn(pred, y) + + # Backpropagation + optimizer.zero_grad() + loss.backward() + optimizer.step() + + if batch % 100 == 0: + loss, current = loss.item(), batch * len(X) + print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]") +``` + +We also check the model's performance against the test dataset to ensure it is learning. + + + + +```python +def test(dataloader, model, loss_fn): + size = len(dataloader.dataset) + num_batches = len(dataloader) + model.eval() + test_loss, correct = 0, 0 + with torch.no_grad(): + for X, y in dataloader: + X, y = X.to(device), y.to(device) + pred = model(X) + test_loss += loss_fn(pred, y).item() + correct += (pred.argmax(1) == y).type(torch.float).sum().item() + test_loss /= num_batches + correct /= size + print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n") +``` + +The training process is conducted over several iterations (*epochs*). During each epoch, the model learns +parameters to make better predictions. We print the model's accuracy and loss at each epoch; we'd like to see the +accuracy increase and the loss decrease with every epoch. + + + + +```python +epochs = 5 +for t in range(epochs): + print(f"Epoch {t+1}\n-------------------------------") + train(train_dataloader, model, loss_fn, optimizer) + test(test_dataloader, model, loss_fn) +print("Done!") +``` + + Epoch 1 + ------------------------------- + loss: 2.300704 [ 0/60000] + loss: 2.294491 [ 6400/60000] + loss: 2.270792 [12800/60000] + loss: 2.270757 [19200/60000] + loss: 2.246651 [25600/60000] + loss: 2.223734 [32000/60000] + loss: 2.230299 [38400/60000] + loss: 2.197789 [44800/60000] + loss: 2.186385 [51200/60000] + loss: 2.171854 [57600/60000] + Test Error: + Accuracy: 40.4%, Avg loss: 2.158354 + + Epoch 2 + ------------------------------- + loss: 2.157282 [ 0/60000] + loss: 2.157837 [ 6400/60000] + loss: 2.098653 [12800/60000] + loss: 2.123712 [19200/60000] + loss: 2.070209 [25600/60000] + loss: 2.017735 [32000/60000] + loss: 2.044564 [38400/60000] + loss: 1.971302 [44800/60000] + loss: 1.963748 [51200/60000] + loss: 1.920766 [57600/60000] + Test Error: + Accuracy: 55.5%, Avg loss: 1.902382 + + Epoch 3 + ------------------------------- + loss: 1.919148 [ 0/60000] + loss: 1.903148 [ 6400/60000] + loss: 1.782882 [12800/60000] + loss: 1.834309 [19200/60000] + loss: 1.722989 [25600/60000] + loss: 1.676954 [32000/60000] + loss: 1.698752 [38400/60000] + loss: 1.602475 [44800/60000] + loss: 1.614792 [51200/60000] + loss: 1.532669 [57600/60000] + Test Error: + Accuracy: 61.7%, Avg loss: 1.533873 + + Epoch 4 + ------------------------------- + loss: 1.585873 [ 0/60000] + loss: 1.560321 [ 6400/60000] + loss: 1.407954 [12800/60000] + loss: 1.488211 [19200/60000] + loss: 1.364034 [25600/60000] + loss: 1.362447 [32000/60000] + loss: 1.370802 [38400/60000] + loss: 1.302972 [44800/60000] + loss: 1.327800 [51200/60000] + loss: 1.235748 [57600/60000] + Test Error: + Accuracy: 63.4%, Avg loss: 1.260575 + + Epoch 5 + ------------------------------- + loss: 1.331637 [ 0/60000] + loss: 1.313866 [ 6400/60000] + loss: 1.153163 [12800/60000] + loss: 1.257744 [19200/60000] + loss: 1.137783 [25600/60000] + loss: 1.162715 [32000/60000] + loss: 1.172138 [38400/60000] + loss: 1.120971 [44800/60000] + loss: 1.149632 [51200/60000] + loss: 1.069323 [57600/60000] + Test Error: + Accuracy: 64.6%, Avg loss: 1.093657 + + Done! + + +Read more about [Training your model](optimization_tutorial.html). + + + + +-------------- + + + + +## Saving Models +A common way to save a model is to serialize the internal state dictionary (containing the model parameters). + + + + +```python +torch.save(model.state_dict(), "model.pth") +print("Saved PyTorch Model State to model.pth") +``` + + Saved PyTorch Model State to model.pth + + +## Loading Models + +The process for loading a model includes re-creating the model structure and loading +the state dictionary into it. + + + + +```python +model = NeuralNetwork() +model.load_state_dict(torch.load("model.pth")) +``` + + + + + + + + +This model can now be used to make predictions. + + + + +```python +classes = [ + "T-shirt/top", + "Trouser", + "Pullover", + "Dress", + "Coat", + "Sandal", + "Shirt", + "Sneaker", + "Bag", + "Ankle boot", +] + +model.eval() +x, y = test_data[0][0], test_data[0][1] +with torch.no_grad(): + pred = model(x) + predicted, actual = classes[pred[0].argmax(0)], classes[y] + print(f'Predicted: "{predicted}", Actual: "{actual}"') +``` + + Predicted: "Ankle boot", Actual: "Ankle boot" + + +Read more about [Saving & Loading your model](saveloadrun_tutorial.html). + + + diff --git a/docs/03-Tensors.md b/docs/03-Tensors.md new file mode 100644 index 0000000..d87b048 --- /dev/null +++ b/docs/03-Tensors.md @@ -0,0 +1,355 @@ +[Learn the Basics](intro.html) || +[Quickstart](quickstart_tutorial.html) || +**Tensors** || +[Datasets & DataLoaders](data_tutorial.html) || +[Transforms](transforms_tutorial.html) || +[Build Model](buildmodel_tutorial.html) || +[Autograd](autogradqs_tutorial.html) || +[Optimization](optimization_tutorial.html) || +[Save & Load Model](saveloadrun_tutorial.html) + +# Tensors + +Tensors are a specialized data structure that are very similar to arrays and matrices. +In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters. + +Tensors are similar to [NumPy’s](https://numpy.org/) ndarrays, except that tensors can run on GPUs or other hardware accelerators. In fact, tensors and +NumPy arrays can often share the same underlying memory, eliminating the need to copy data (see `bridge-to-np-label`). Tensors +are also optimized for automatic differentiation (we'll see more about that later in the [Autograd](autogradqs_tutorial.html)_ +section). If you’re familiar with ndarrays, you’ll be right at home with the Tensor API. If not, follow along! + + + +```python +import torch +import numpy as np +``` + +## Initializing a Tensor + +Tensors can be initialized in various ways. Take a look at the following examples: + +**Directly from data** + +Tensors can be created directly from data. The data type is automatically inferred. + + + + +```python +data = [[1, 2],[3, 4]] +x_data = torch.tensor(data) +``` + +**From a NumPy array** + +Tensors can be created from NumPy arrays (and vice versa - see `bridge-to-np-label`). + + + + +```python +np_array = np.array(data) +x_np = torch.from_numpy(np_array) +``` + +**From another tensor:** + +The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden. + + + + +```python +x_ones = torch.ones_like(x_data) # retains the properties of x_data +print(f"Ones Tensor: \n {x_ones} \n") + +x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data +print(f"Random Tensor: \n {x_rand} \n") +``` + + Ones Tensor: + tensor([[1, 1], + [1, 1]]) + + Random Tensor: + tensor([[0.0504, 0.9505], + [0.6485, 0.6105]]) + + + +**With random or constant values:** + +``shape`` is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor. + + + + +```python +shape = (2,3,) +rand_tensor = torch.rand(shape) +ones_tensor = torch.ones(shape) +zeros_tensor = torch.zeros(shape) + +print(f"Random Tensor: \n {rand_tensor} \n") +print(f"Ones Tensor: \n {ones_tensor} \n") +print(f"Zeros Tensor: \n {zeros_tensor}") +``` + + Random Tensor: + tensor([[0.6582, 0.2838, 0.1244], + [0.1692, 0.0394, 0.2638]]) + + Ones Tensor: + tensor([[1., 1., 1.], + [1., 1., 1.]]) + + Zeros Tensor: + tensor([[0., 0., 0.], + [0., 0., 0.]]) + + +-------------- + + + + +## Attributes of a Tensor + +Tensor attributes describe their shape, datatype, and the device on which they are stored. + + + + +```python +tensor = torch.rand(3,4) + +print(f"Shape of tensor: {tensor.shape}") +print(f"Datatype of tensor: {tensor.dtype}") +print(f"Device tensor is stored on: {tensor.device}") +``` + + Shape of tensor: torch.Size([3, 4]) + Datatype of tensor: torch.float32 + Device tensor is stored on: cpu + + +-------------- + + + + +## Operations on Tensors + +Over 100 tensor operations, including arithmetic, linear algebra, matrix manipulation (transposing, +indexing, slicing), sampling and more are +comprehensively described [here](https://pytorch.org/docs/stable/torch.html)_. + +Each of these operations can be run on the GPU (at typically higher speeds than on a +CPU). If you’re using Colab, allocate a GPU by going to Runtime > Change runtime type > GPU. + +By default, tensors are created on the CPU. We need to explicitly move tensors to the GPU using +``.to`` method (after checking for GPU availability). Keep in mind that copying large tensors +across devices can be expensive in terms of time and memory! + + + + +```python +# We move our tensor to the GPU if available +if torch.cuda.is_available(): + tensor = tensor.to("cuda") +``` + +Try out some of the operations from the list. +If you're familiar with the NumPy API, you'll find the Tensor API a breeze to use. + + + + +**Standard numpy-like indexing and slicing:** + + + + +```python +tensor = torch.ones(4, 4) +print(f"First row: {tensor[0]}") +print(f"First column: {tensor[:, 0]}") +print(f"Last column: {tensor[..., -1]}") +tensor[:,1] = 0 +print(tensor) +``` + + First row: tensor([1., 1., 1., 1.]) + First column: tensor([1., 1., 1., 1.]) + Last column: tensor([1., 1., 1., 1.]) + tensor([[1., 0., 1., 1.], + [1., 0., 1., 1.], + [1., 0., 1., 1.], + [1., 0., 1., 1.]]) + + +**Joining tensors** You can use ``torch.cat`` to concatenate a sequence of tensors along a given dimension. +See also [torch.stack](https://pytorch.org/docs/stable/generated/torch.stack.html)_, +another tensor joining op that is subtly different from ``torch.cat``. + + + + +```python +t1 = torch.cat([tensor, tensor, tensor], dim=1) +print(t1) +``` + + tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.], + [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.], + [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.], + [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]]) + + +**Arithmetic operations** + + + + +```python +# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value +# ``tensor.T`` returns the transpose of a tensor +y1 = tensor @ tensor.T +y2 = tensor.matmul(tensor.T) + +y3 = torch.rand_like(y1) +torch.matmul(tensor, tensor.T, out=y3) + + +# This computes the element-wise product. z1, z2, z3 will have the same value +z1 = tensor * tensor +z2 = tensor.mul(tensor) + +z3 = torch.rand_like(tensor) +torch.mul(tensor, tensor, out=z3) +``` + + + + + tensor([[1., 0., 1., 1.], + [1., 0., 1., 1.], + [1., 0., 1., 1.], + [1., 0., 1., 1.]]) + + + +**Single-element tensors** If you have a one-element tensor, for example by aggregating all +values of a tensor into one value, you can convert it to a Python +numerical value using ``item()``: + + + + +```python +agg = tensor.sum() +agg_item = agg.item() +print(agg_item, type(agg_item)) +``` + + 12.0 + + +**In-place operations** +Operations that store the result into the operand are called in-place. They are denoted by a ``_`` suffix. +For example: ``x.copy_(y)``, ``x.t_()``, will change ``x``. + + + + +```python +print(f"{tensor} \n") +tensor.add_(5) +print(tensor) +``` + + tensor([[1., 0., 1., 1.], + [1., 0., 1., 1.], + [1., 0., 1., 1.], + [1., 0., 1., 1.]]) + + tensor([[6., 5., 6., 6.], + [6., 5., 6., 6.], + [6., 5., 6., 6.], + [6., 5., 6., 6.]]) + + +

Note

In-place operations save some memory, but can be problematic when computing derivatives because of an immediate loss + of history. Hence, their use is discouraged.

+ + + +-------------- + + + + + +## Bridge with NumPy +Tensors on the CPU and NumPy arrays can share their underlying memory +locations, and changing one will change the other. + + + +### Tensor to NumPy array + + + + +```python +t = torch.ones(5) +print(f"t: {t}") +n = t.numpy() +print(f"n: {n}") +``` + + t: tensor([1., 1., 1., 1., 1.]) + n: [1. 1. 1. 1. 1.] + + +A change in the tensor reflects in the NumPy array. + + + + +```python +t.add_(1) +print(f"t: {t}") +print(f"n: {n}") +``` + + t: tensor([2., 2., 2., 2., 2.]) + n: [2. 2. 2. 2. 2.] + + +### NumPy array to Tensor + + + + +```python +n = np.ones(5) +t = torch.from_numpy(n) +``` + +Changes in the NumPy array reflects in the tensor. + + + + +```python +np.add(n, 1, out=n) +print(f"t: {t}") +print(f"n: {n}") +``` + + t: tensor([2., 2., 2., 2., 2.], dtype=torch.float64) + n: [2. 2. 2. 2. 2.] + diff --git a/docs/04-Data.md b/docs/04-Data.md new file mode 100644 index 0000000..8cd12e9 --- /dev/null +++ b/docs/04-Data.md @@ -0,0 +1,286 @@ +```python +%matplotlib inline +``` + + +[Learn the Basics](intro.html) || +[Quickstart](quickstart_tutorial.html) || +[Tensors](tensorqs_tutorial.html) || +**Datasets & DataLoaders** || +[Transforms](transforms_tutorial.html) || +[Build Model](buildmodel_tutorial.html) || +[Autograd](autogradqs_tutorial.html) || +[Optimization](optimization_tutorial.html) || +[Save & Load Model](saveloadrun_tutorial.html) + +# Datasets & DataLoaders + + +Code for processing data samples can get messy and hard to maintain; we ideally want our dataset code +to be decoupled from our model training code for better readability and modularity. +PyTorch provides two data primitives: ``torch.utils.data.DataLoader`` and ``torch.utils.data.Dataset`` +that allow you to use pre-loaded datasets as well as your own data. +``Dataset`` stores the samples and their corresponding labels, and ``DataLoader`` wraps an iterable around +the ``Dataset`` to enable easy access to the samples. + +PyTorch domain libraries provide a number of pre-loaded datasets (such as FashionMNIST) that +subclass ``torch.utils.data.Dataset`` and implement functions specific to the particular data. +They can be used to prototype and benchmark your model. You can find them +here: [Image Datasets](https://pytorch.org/vision/stable/datasets.html), +[Text Datasets](https://pytorch.org/text/stable/datasets.html), and +[Audio Datasets](https://pytorch.org/audio/stable/datasets.html) + + + + +## Loading a Dataset + +Here is an example of how to load the [Fashion-MNIST](https://research.zalando.com/project/fashion_mnist/fashion_mnist/) dataset from TorchVision. +Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. +Each example comprises a 28×28 grayscale image and an associated label from one of 10 classes. + +We load the [FashionMNIST Dataset](https://pytorch.org/vision/stable/datasets.html#fashion-mnist) with the following parameters: + - ``root`` is the path where the train/test data is stored, + - ``train`` specifies training or test dataset, + - ``download=True`` downloads the data from the internet if it's not available at ``root``. + - ``transform`` and ``target_transform`` specify the feature and label transformations + + + + +```python +import torch +from torch.utils.data import Dataset +from torchvision import datasets +from torchvision.transforms import ToTensor +import matplotlib.pyplot as plt + + +training_data = datasets.FashionMNIST( + root="data", + train=True, + download=True, + transform=ToTensor() +) + +test_data = datasets.FashionMNIST( + root="data", + train=False, + download=True, + transform=ToTensor() +) +``` + +## Iterating and Visualizing the Dataset + +We can index ``Datasets`` manually like a list: ``training_data[index]``. +We use ``matplotlib`` to visualize some samples in our training data. + + + + +```python +labels_map = { + 0: "T-Shirt", + 1: "Trouser", + 2: "Pullover", + 3: "Dress", + 4: "Coat", + 5: "Sandal", + 6: "Shirt", + 7: "Sneaker", + 8: "Bag", + 9: "Ankle Boot", +} +figure = plt.figure(figsize=(8, 8)) +cols, rows = 3, 3 +for i in range(1, cols * rows + 1): + sample_idx = torch.randint(len(training_data), size=(1,)).item() + img, label = training_data[sample_idx] + figure.add_subplot(rows, cols, i) + plt.title(labels_map[label]) + plt.axis("off") + plt.imshow(img.squeeze(), cmap="gray") +plt.show() +``` + + + +![png](../docs/04-Data_files/../docs/04-Data_6_0.png) + + + +.. + .. figure:: /_static/img/basics/fashion_mnist.png + :alt: fashion_mnist + + + +-------------- + + + + +## Creating a Custom Dataset for your files + +A custom Dataset class must implement three functions: `__init__`, `__len__`, and `__getitem__`. +Take a look at this implementation; the FashionMNIST images are stored +in a directory ``img_dir``, and their labels are stored separately in a CSV file ``annotations_file``. + +In the next sections, we'll break down what's happening in each of these functions. + + + + +```python +import os +import pandas as pd +from torchvision.io import read_image + +class CustomImageDataset(Dataset): + def __init__(self, annotations_file, img_dir, transform=None, target_transform=None): + self.img_labels = pd.read_csv(annotations_file) + self.img_dir = img_dir + self.transform = transform + self.target_transform = target_transform + + def __len__(self): + return len(self.img_labels) + + def __getitem__(self, idx): + img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0]) + image = read_image(img_path) + label = self.img_labels.iloc[idx, 1] + if self.transform: + image = self.transform(image) + if self.target_transform: + label = self.target_transform(label) + return image, label +``` + +### __init__ + +The __init__ function is run once when instantiating the Dataset object. We initialize +the directory containing the images, the annotations file, and both transforms (covered +in more detail in the next section). + +The labels.csv file looks like: :: + + tshirt1.jpg, 0 + tshirt2.jpg, 0 + ...... + ankleboot999.jpg, 9 + + + + +```python +def __init__(self, annotations_file, img_dir, transform=None, target_transform=None): + self.img_labels = pd.read_csv(annotations_file) + self.img_dir = img_dir + self.transform = transform + self.target_transform = target_transform +``` + +### __len__ + +The __len__ function returns the number of samples in our dataset. + +Example: + + + + +```python +def __len__(self): + return len(self.img_labels) +``` + +### __getitem__ + +The __getitem__ function loads and returns a sample from the dataset at the given index ``idx``. +Based on the index, it identifies the image's location on disk, converts that to a tensor using ``read_image``, retrieves the +corresponding label from the csv data in ``self.img_labels``, calls the transform functions on them (if applicable), and returns the +tensor image and corresponding label in a tuple. + + + + +```python +def __getitem__(self, idx): + img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0]) + image = read_image(img_path) + label = self.img_labels.iloc[idx, 1] + if self.transform: + image = self.transform(image) + if self.target_transform: + label = self.target_transform(label) + return image, label +``` + +-------------- + + + + +## Preparing your data for training with DataLoaders +The ``Dataset`` retrieves our dataset's features and labels one sample at a time. While training a model, we typically want to +pass samples in "minibatches", reshuffle the data at every epoch to reduce model overfitting, and use Python's ``multiprocessing`` to +speed up data retrieval. + +``DataLoader`` is an iterable that abstracts this complexity for us in an easy API. + + + + +```python +from torch.utils.data import DataLoader + +train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True) +test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True) +``` + +## Iterate through the DataLoader + +We have loaded that dataset into the ``DataLoader`` and can iterate through the dataset as needed. +Each iteration below returns a batch of ``train_features`` and ``train_labels`` (containing ``batch_size=64`` features and labels respectively). +Because we specified ``shuffle=True``, after we iterate over all batches the data is shuffled (for finer-grained control over +the data loading order, take a look at [Samplers](https://pytorch.org/docs/stable/data.html#data-loading-order-and-sampler)). + + + + +```python +# Display image and label. +train_features, train_labels = next(iter(train_dataloader)) +print(f"Feature batch shape: {train_features.size()}") +print(f"Labels batch shape: {train_labels.size()}") +img = train_features[0].squeeze() +label = train_labels[0] +plt.imshow(img, cmap="gray") +plt.show() +print(f"Label: {label}") +``` + + Feature batch shape: torch.Size([64, 1, 28, 28]) + Labels batch shape: torch.Size([64]) + + + + +![png](../docs/04-Data_files/../docs/04-Data_21_1.png) + + + + Label: 1 + + +-------------- + + + + +## Further Reading +- [torch.utils.data API](https://pytorch.org/docs/stable/data.html) + + diff --git a/docs/05-Transforms.md b/docs/05-Transforms.md new file mode 100644 index 0000000..20d50be --- /dev/null +++ b/docs/05-Transforms.md @@ -0,0 +1,77 @@ +[Learn the Basics](intro.html) || +[Quickstart](quickstart_tutorial.html) || +[Tensors](tensorqs_tutorial.html) || +[Datasets & DataLoaders](data_tutorial.html) || +**Transforms** || +[Build Model](buildmodel_tutorial.html) || +[Autograd](autogradqs_tutorial.html) || +[Optimization](optimization_tutorial.html) || +[Save & Load Model](saveloadrun_tutorial.html) + +# Transforms + +Data does not always come in its final processed form that is required for +training machine learning algorithms. We use **transforms** to perform some +manipulation of the data and make it suitable for training. + +All TorchVision datasets have two parameters -``transform`` to modify the features and +``target_transform`` to modify the labels - that accept callables containing the transformation logic. +The [torchvision.transforms](https://pytorch.org/vision/stable/transforms.html) module offers +several commonly-used transforms out of the box. + +The FashionMNIST features are in PIL Image format, and the labels are integers. +For training, we need the features as normalized tensors, and the labels as one-hot encoded tensors. +To make these transformations, we use ``ToTensor`` and ``Lambda``. + + + +```python +%matplotlib inline + +import torch +from torchvision import datasets +from torchvision.transforms import ToTensor, Lambda + +ds = datasets.FashionMNIST( + root="data", + train=True, + download=True, + transform=ToTensor(), + target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1)) +) +``` + +## ToTensor() + +[ToTensor](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.ToTensor) +converts a PIL image or NumPy ``ndarray`` into a ``FloatTensor``. and scales +the image's pixel intensity values in the range [0., 1.] + + + + +## Lambda Transforms + +Lambda transforms apply any user-defined lambda function. Here, we define a function +to turn the integer into a one-hot encoded tensor. +It first creates a zero tensor of size 10 (the number of labels in our dataset) and calls +[scatter_](https://pytorch.org/docs/stable/generated/torch.Tensor.scatter_.html) which assigns a +``value=1`` on the index as given by the label ``y``. + + + + +```python +target_transform = Lambda(lambda y: torch.zeros( + 10, dtype=torch.float).scatter_(dim=0, index=torch.tensor(y), value=1)) +``` + +-------------- + + + + +### Further Reading +- [torchvision.transforms API](https://pytorch.org/vision/stable/transforms.html) + + diff --git a/docs/06-BuildModel.md b/docs/06-BuildModel.md new file mode 100644 index 0000000..92a7fca --- /dev/null +++ b/docs/06-BuildModel.md @@ -0,0 +1,312 @@ +[Learn the Basics](intro.html) || +[Quickstart](quickstart_tutorial.html) || +[Tensors](tensorqs_tutorial.html) || +[Datasets & DataLoaders](data_tutorial.html) || +[Transforms](transforms_tutorial.html) || +**Build Model** || +[Autograd](autogradqs_tutorial.html) || +[Optimization](optimization_tutorial.html) || +[Save & Load Model](saveloadrun_tutorial.html) + +# Build the Neural Network + +Neural networks comprise of layers/modules that perform operations on data. +The [torch.nn](https://pytorch.org/docs/stable/nn.html) namespace provides all the building blocks you need to +build your own neural network. Every module in PyTorch subclasses the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). +A neural network is a module itself that consists of other modules (layers). This nested structure allows for +building and managing complex architectures easily. + +In the following sections, we'll build a neural network to classify images in the FashionMNIST dataset. + + + +```python +%matplotlib inline + +import os +import torch +from torch import nn +from torch.utils.data import DataLoader +from torchvision import datasets, transforms +``` + +## Get Device for Training +We want to be able to train our model on a hardware accelerator like the GPU, +if it is available. Let's check to see if +[torch.cuda](https://pytorch.org/docs/stable/notes/cuda.html) is available, else we +continue to use the CPU. + + + + +```python +device = "cuda" if torch.cuda.is_available() else "cpu" +print(f"Using {device} device") +``` + + Using cpu device + + +## Define the Class +We define our neural network by subclassing ``nn.Module``, and +initialize the neural network layers in ``__init__``. Every ``nn.Module`` subclass implements +the operations on input data in the ``forward`` method. + + + + +```python +class NeuralNetwork(nn.Module): + def __init__(self): + super().__init__() + self.flatten = nn.Flatten() + self.linear_relu_stack = nn.Sequential( + nn.Linear(28*28, 512), + nn.ReLU(), + nn.Linear(512, 512), + nn.ReLU(), + nn.Linear(512, 10), + ) + + def forward(self, x): + x = self.flatten(x) + logits = self.linear_relu_stack(x) + return logits +``` + +We create an instance of ``NeuralNetwork``, and move it to the ``device``, and print +its structure. + + + + +```python +model = NeuralNetwork().to(device) +print(model) +``` + + NeuralNetwork( + (flatten): Flatten(start_dim=1, end_dim=-1) + (linear_relu_stack): Sequential( + (0): Linear(in_features=784, out_features=512, bias=True) + (1): ReLU() + (2): Linear(in_features=512, out_features=512, bias=True) + (3): ReLU() + (4): Linear(in_features=512, out_features=10, bias=True) + ) + ) + + +To use the model, we pass it the input data. This executes the model's ``forward``, +along with some [background operations](https://github.com/pytorch/pytorch/blob/270111b7b611d174967ed204776985cefca9c144/torch/nn/modules/module.py#L866). +Do not call ``model.forward()`` directly! + +Calling the model on the input returns a 2-dimensional tensor with dim=0 corresponding to each output of 10 raw predicted values for each class, and dim=1 corresponding to the individual values of each output. +We get the prediction probabilities by passing it through an instance of the ``nn.Softmax`` module. + + + + +```python +X = torch.rand(1, 28, 28, device=device) +logits = model(X) +pred_probab = nn.Softmax(dim=1)(logits) +y_pred = pred_probab.argmax(1) +print(f"Predicted class: {y_pred}") +``` + + Predicted class: tensor([9]) + + +-------------- + + + + +## Model Layers + +Let's break down the layers in the FashionMNIST model. To illustrate it, we +will take a sample minibatch of 3 images of size 28x28 and see what happens to it as +we pass it through the network. + + + + +```python +input_image = torch.rand(3,28,28) +print(input_image.size()) +``` + + torch.Size([3, 28, 28]) + + +### nn.Flatten +We initialize the [nn.Flatten](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html) +layer to convert each 2D 28x28 image into a contiguous array of 784 pixel values ( +the minibatch dimension (at dim=0) is maintained). + + + + +```python +flatten = nn.Flatten() +flat_image = flatten(input_image) +print(flat_image.size()) +``` + + torch.Size([3, 784]) + + +### nn.Linear +The [linear layer](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) +is a module that applies a linear transformation on the input using its stored weights and biases. + + + + + +```python +layer1 = nn.Linear(in_features=28*28, out_features=20) +hidden1 = layer1(flat_image) +print(hidden1.size()) +``` + + torch.Size([3, 20]) + + +### nn.ReLU +Non-linear activations are what create the complex mappings between the model's inputs and outputs. +They are applied after linear transformations to introduce *nonlinearity*, helping neural networks +learn a wide variety of phenomena. + +In this model, we use [nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html) between our +linear layers, but there's other activations to introduce non-linearity in your model. + + + + +```python +print(f"Before ReLU: {hidden1}\n\n") +hidden1 = nn.ReLU()(hidden1) +print(f"After ReLU: {hidden1}") +``` + + Before ReLU: tensor([[-5.5712e-01, 4.1135e-01, -7.4510e-03, -5.4891e-02, 7.3538e-02, + 4.6617e-01, 5.3287e-01, 7.2283e-02, -3.7471e-01, -3.9285e-01, + -6.7889e-01, 2.1088e-01, 1.8742e-01, 4.0150e-01, -5.6422e-02, + -4.8977e-02, -1.6230e-01, 3.0556e-01, -7.1455e-01, -6.6180e-02], + [-4.2601e-01, 6.2487e-01, -5.9415e-02, 2.3934e-02, 3.9810e-01, + 3.2441e-01, 7.0026e-01, -1.2423e-01, -5.2260e-01, -1.7234e-01, + -5.5835e-01, 2.2128e-01, 2.7830e-01, 2.4191e-01, -7.7681e-02, + -2.4954e-01, 1.5836e-01, 1.9990e-01, -1.1715e-01, -3.2138e-01], + [-4.9225e-01, 4.1050e-01, -1.5492e-01, 8.9106e-03, 3.5985e-01, + 3.1355e-01, 6.2615e-01, -1.9053e-04, -5.7080e-01, -1.7064e-01, + -6.5802e-01, 3.3700e-01, 4.5726e-01, 3.1022e-01, -4.0316e-01, + -3.8029e-01, -1.2243e-01, 3.6732e-01, -5.6789e-01, -9.4490e-02]], + grad_fn=) + + + After ReLU: tensor([[0.0000, 0.4113, 0.0000, 0.0000, 0.0735, 0.4662, 0.5329, 0.0723, 0.0000, + 0.0000, 0.0000, 0.2109, 0.1874, 0.4015, 0.0000, 0.0000, 0.0000, 0.3056, + 0.0000, 0.0000], + [0.0000, 0.6249, 0.0000, 0.0239, 0.3981, 0.3244, 0.7003, 0.0000, 0.0000, + 0.0000, 0.0000, 0.2213, 0.2783, 0.2419, 0.0000, 0.0000, 0.1584, 0.1999, + 0.0000, 0.0000], + [0.0000, 0.4105, 0.0000, 0.0089, 0.3599, 0.3136, 0.6262, 0.0000, 0.0000, + 0.0000, 0.0000, 0.3370, 0.4573, 0.3102, 0.0000, 0.0000, 0.0000, 0.3673, + 0.0000, 0.0000]], grad_fn=) + + +### nn.Sequential +[nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) is an ordered +container of modules. The data is passed through all the modules in the same order as defined. You can use +sequential containers to put together a quick network like ``seq_modules``. + + + + +```python +seq_modules = nn.Sequential( + flatten, + layer1, + nn.ReLU(), + nn.Linear(20, 10) +) +input_image = torch.rand(3,28,28) +logits = seq_modules(input_image) +``` + +### nn.Softmax +The last linear layer of the neural network returns `logits` - raw values in [-\infty, \infty] - which are passed to the +[nn.Softmax](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html) module. The logits are scaled to values +[0, 1] representing the model's predicted probabilities for each class. ``dim`` parameter indicates the dimension along +which the values must sum to 1. + + + + +```python +softmax = nn.Softmax(dim=1) +pred_probab = softmax(logits) +``` + +## Model Parameters +Many layers inside a neural network are *parameterized*, i.e. have associated weights +and biases that are optimized during training. Subclassing ``nn.Module`` automatically +tracks all fields defined inside your model object, and makes all parameters +accessible using your model's ``parameters()`` or ``named_parameters()`` methods. + +In this example, we iterate over each parameter, and print its size and a preview of its values. + + + + + +```python +print(f"Model structure: {model}\n\n") + +for name, param in model.named_parameters(): + print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n") +``` + + Model structure: NeuralNetwork( + (flatten): Flatten(start_dim=1, end_dim=-1) + (linear_relu_stack): Sequential( + (0): Linear(in_features=784, out_features=512, bias=True) + (1): ReLU() + (2): Linear(in_features=512, out_features=512, bias=True) + (3): ReLU() + (4): Linear(in_features=512, out_features=10, bias=True) + ) + ) + + + Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0211, 0.0168, 0.0334, ..., -0.0151, -0.0033, 0.0032], + [-0.0022, 0.0293, -0.0090, ..., -0.0044, -0.0147, -0.0251]], + grad_fn=) + + Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([0.0128, 0.0086], grad_fn=) + + Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[-0.0165, -0.0068, -0.0016, ..., -0.0098, 0.0119, 0.0326], + [ 0.0330, -0.0306, -0.0129, ..., -0.0371, -0.0291, -0.0273]], + grad_fn=) + + Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([ 0.0024, -0.0164], grad_fn=) + + Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[ 0.0046, 0.0249, 0.0123, ..., 0.0352, -0.0170, 0.0232], + [ 0.0038, 0.0283, 0.0235, ..., -0.0416, 0.0304, 0.0217]], + grad_fn=) + + Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([0.0118, 0.0417], grad_fn=) + + + +-------------- + + + + +## Further Reading +- [torch.nn API](https://pytorch.org/docs/stable/nn.html) + + diff --git a/docs/07-Autograd.md b/docs/07-Autograd.md new file mode 100644 index 0000000..f8e1eee --- /dev/null +++ b/docs/07-Autograd.md @@ -0,0 +1,294 @@ +```python +%matplotlib inline +``` + + +[Learn the Basics](intro.html) || +[Quickstart](quickstart_tutorial.html) || +[Tensors](tensorqs_tutorial.html) || +[Datasets & DataLoaders](data_tutorial.html) || +[Transforms](transforms_tutorial.html) || +[Build Model](buildmodel_tutorial.html) || +**Autograd** || +[Optimization](optimization_tutorial.html) || +[Save & Load Model](saveloadrun_tutorial.html) + +# Automatic Differentiation with ``torch.autograd`` + +When training neural networks, the most frequently used algorithm is +**back propagation**. In this algorithm, parameters (model weights) are +adjusted according to the **gradient** of the loss function with respect +to the given parameter. + +To compute those gradients, PyTorch has a built-in differentiation engine +called ``torch.autograd``. It supports automatic computation of gradient for any +computational graph. + +Consider the simplest one-layer neural network, with input ``x``, +parameters ``w`` and ``b``, and some loss function. It can be defined in +PyTorch in the following manner: + + + +```python +import torch + +x = torch.ones(5) # input tensor +y = torch.zeros(3) # expected output +w = torch.randn(5, 3, requires_grad=True) +b = torch.randn(3, requires_grad=True) +z = torch.matmul(x, w)+b +loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y) +``` + +## Tensors, Functions and Computational graph + +This code defines the following **computational graph**: + +.. figure:: /_static/img/basics/comp-graph.png + :alt: + +In this network, ``w`` and ``b`` are **parameters**, which we need to +optimize. Thus, we need to be able to compute the gradients of loss +function with respect to those variables. In order to do that, we set +the ``requires_grad`` property of those tensors. + + + +

Note

You can set the value of ``requires_grad`` when creating a + tensor, or later by using ``x.requires_grad_(True)`` method.

+ + + +A function that we apply to tensors to construct computational graph is +in fact an object of class ``Function``. This object knows how to +compute the function in the *forward* direction, and also how to compute +its derivative during the *backward propagation* step. A reference to +the backward propagation function is stored in ``grad_fn`` property of a +tensor. You can find more information of ``Function`` [in the +documentation](https://pytorch.org/docs/stable/autograd.html#function)_. + + + + + +```python +print(f"Gradient function for z = {z.grad_fn}") +print(f"Gradient function for loss = {loss.grad_fn}") +``` + + Gradient function for z = + Gradient function for loss = + + +## Computing Gradients + +To optimize weights of parameters in the neural network, we need to +compute the derivatives of our loss function with respect to parameters, +namely, we need $\frac{\partial loss}{\partial w}$ and +$\frac{\partial loss}{\partial b}$ under some fixed values of +``x`` and ``y``. To compute those derivatives, we call +``loss.backward()``, and then retrieve the values from ``w.grad`` and +``b.grad``: + + + + + +```python +loss.backward() +print(w.grad) +print(b.grad) +``` + + tensor([[0.3244, 0.2353, 0.0700], + [0.3244, 0.2353, 0.0700], + [0.3244, 0.2353, 0.0700], + [0.3244, 0.2353, 0.0700], + [0.3244, 0.2353, 0.0700]]) + tensor([0.3244, 0.2353, 0.0700]) + + +

Note

- We can only obtain the ``grad`` properties for the leaf + nodes of the computational graph, which have ``requires_grad`` property + set to ``True``. For all other nodes in our graph, gradients will not be + available. + - We can only perform gradient calculations using + ``backward`` once on a given graph, for performance reasons. If we need + to do several ``backward`` calls on the same graph, we need to pass + ``retain_graph=True`` to the ``backward`` call.

+ + + + +## Disabling Gradient Tracking + +By default, all tensors with ``requires_grad=True`` are tracking their +computational history and support gradient computation. However, there +are some cases when we do not need to do that, for example, when we have +trained the model and just want to apply it to some input data, i.e. we +only want to do *forward* computations through the network. We can stop +tracking computations by surrounding our computation code with +``torch.no_grad()`` block: + + + + + +```python +z = torch.matmul(x, w)+b +print(z.requires_grad) + +with torch.no_grad(): + z = torch.matmul(x, w)+b +print(z.requires_grad) +``` + + True + False + + +Another way to achieve the same result is to use the ``detach()`` method +on the tensor: + + + + + +```python +z = torch.matmul(x, w)+b +z_det = z.detach() +print(z_det.requires_grad) +``` + + False + + +There are reasons you might want to disable gradient tracking: + - To mark some parameters in your neural network as **frozen parameters**. + - To **speed up computations** when you are only doing forward pass, because computations on tensors that do + not track gradients would be more efficient. + + + +## More on Computational Graphs +Conceptually, autograd keeps a record of data (tensors) and all executed +operations (along with the resulting new tensors) in a directed acyclic +graph (DAG) consisting of +[Function](https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)_ +objects. In this DAG, leaves are the input tensors, roots are the output +tensors. By tracing this graph from roots to leaves, you can +automatically compute the gradients using the chain rule. + +In a forward pass, autograd does two things simultaneously: + +- run the requested operation to compute a resulting tensor +- maintain the operation’s *gradient function* in the DAG. + +The backward pass kicks off when ``.backward()`` is called on the DAG +root. ``autograd`` then: + +- computes the gradients from each ``.grad_fn``, +- accumulates them in the respective tensor’s ``.grad`` attribute +- using the chain rule, propagates all the way to the leaf tensors. + +

Note

**DAGs are dynamic in PyTorch** + An important thing to note is that the graph is recreated from scratch; after each + ``.backward()`` call, autograd starts populating a new graph. This is + exactly what allows you to use control flow statements in your model; + you can change the shape, size and operations at every iteration if + needed.

+ + + +## Optional Reading: Tensor Gradients and Jacobian Products + +In many cases, we have a scalar loss function, and we need to compute +the gradient with respect to some parameters. However, there are cases +when the output function is an arbitrary tensor. In this case, PyTorch +allows you to compute so-called **Jacobian product**, and not the actual +gradient. + +For a vector function $\vec{y}=f(\vec{x})$, where +$\vec{x}=\langle x_1,\dots,x_n\rangle$ and +$\vec{y}=\langle y_1,\dots,y_m\rangle$, a gradient of +$\vec{y}$ with respect to $\vec{x}$ is given by **Jacobian +matrix**: + +\begin{align}J=\left(\begin{array}{ccc} + \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\ + \vdots & \ddots & \vdots\\ + \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} + \end{array}\right)\end{align} + +Instead of computing the Jacobian matrix itself, PyTorch allows you to +compute **Jacobian Product** $v^T\cdot J$ for a given input vector +$v=(v_1 \dots v_m)$. This is achieved by calling ``backward`` with +$v$ as an argument. The size of $v$ should be the same as +the size of the original tensor, with respect to which we want to +compute the product: + + + + + +```python +inp = torch.eye(4, 5, requires_grad=True) +out = (inp+1).pow(2).t() +out.backward(torch.ones_like(out), retain_graph=True) +print(f"First call\n{inp.grad}") +out.backward(torch.ones_like(out), retain_graph=True) +print(f"\nSecond call\n{inp.grad}") +inp.grad.zero_() +out.backward(torch.ones_like(out), retain_graph=True) +print(f"\nCall after zeroing gradients\n{inp.grad}") +``` + + First call + tensor([[4., 2., 2., 2., 2.], + [2., 4., 2., 2., 2.], + [2., 2., 4., 2., 2.], + [2., 2., 2., 4., 2.]]) + + Second call + tensor([[8., 4., 4., 4., 4.], + [4., 8., 4., 4., 4.], + [4., 4., 8., 4., 4.], + [4., 4., 4., 8., 4.]]) + + Call after zeroing gradients + tensor([[4., 2., 2., 2., 2.], + [2., 4., 2., 2., 2.], + [2., 2., 4., 2., 2.], + [2., 2., 2., 4., 2.]]) + + +Notice that when we call ``backward`` for the second time with the same +argument, the value of the gradient is different. This happens because +when doing ``backward`` propagation, PyTorch **accumulates the +gradients**, i.e. the value of computed gradients is added to the +``grad`` property of all leaf nodes of computational graph. If you want +to compute the proper gradients, you need to zero out the ``grad`` +property before. In real-life training an *optimizer* helps us to do +this. + + + +

Note

Previously we were calling ``backward()`` function without + parameters. This is essentially equivalent to calling + ``backward(torch.tensor(1.0))``, which is a useful way to compute the + gradients in case of a scalar-valued function, such as loss during + neural network training.

+ + + + +-------------- + + + + +### Further Reading +- [Autograd Mechanics](https://pytorch.org/docs/stable/notes/autograd.html) + + diff --git a/docs/08-Optimization.md b/docs/08-Optimization.md new file mode 100644 index 0000000..aeb9aa0 --- /dev/null +++ b/docs/08-Optimization.md @@ -0,0 +1,369 @@ +```python +%matplotlib inline +``` + + +[Learn the Basics](intro.html) || +[Quickstart](quickstart_tutorial.html) || +[Tensors](tensorqs_tutorial.html) || +[Datasets & DataLoaders](data_tutorial.html) || +[Transforms](transforms_tutorial.html) || +[Build Model](buildmodel_tutorial.html) || +[Autograd](autogradqs_tutorial.html) || +**Optimization** || +[Save & Load Model](saveloadrun_tutorial.html) + +# Optimizing Model Parameters + +Now that we have a model and data it's time to train, validate and test our model by optimizing its parameters on +our data. Training a model is an iterative process; in each iteration the model makes a guess about the output, calculates +the error in its guess (*loss*), collects the derivatives of the error with respect to its parameters (as we saw in +the [previous section](autograd_tutorial.html)), and **optimizes** these parameters using gradient descent. For a more +detailed walkthrough of this process, check out this video on [backpropagation from 3Blue1Brown](https://www.youtube.com/watch?v=tIeHLnjs5U8)_. + +## Prerequisite Code +We load the code from the previous sections on [Datasets & DataLoaders](data_tutorial.html) +and [Build Model](buildmodel_tutorial.html). + + + +```python +import torch +from torch import nn +from torch.utils.data import DataLoader +from torchvision import datasets +from torchvision.transforms import ToTensor + +training_data = datasets.FashionMNIST( + root="data", + train=True, + download=True, + transform=ToTensor() +) + +test_data = datasets.FashionMNIST( + root="data", + train=False, + download=True, + transform=ToTensor() +) + +train_dataloader = DataLoader(training_data, batch_size=64) +test_dataloader = DataLoader(test_data, batch_size=64) + +class NeuralNetwork(nn.Module): + def __init__(self): + super(NeuralNetwork, self).__init__() + self.flatten = nn.Flatten() + self.linear_relu_stack = nn.Sequential( + nn.Linear(28*28, 512), + nn.ReLU(), + nn.Linear(512, 512), + nn.ReLU(), + nn.Linear(512, 10), + ) + + def forward(self, x): + x = self.flatten(x) + logits = self.linear_relu_stack(x) + return logits + +model = NeuralNetwork() +``` + +## Hyperparameters + +Hyperparameters are adjustable parameters that let you control the model optimization process. +Different hyperparameter values can impact model training and convergence rates +([read more](https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html)_ about hyperparameter tuning) + +We define the following hyperparameters for training: + - **Number of Epochs** - the number times to iterate over the dataset + - **Batch Size** - the number of data samples propagated through the network before the parameters are updated + - **Learning Rate** - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training. + + + + + +```python +learning_rate = 1e-3 +batch_size = 64 +epochs = 5 +``` + +## Optimization Loop + +Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each +iteration of the optimization loop is called an **epoch**. + +Each epoch consists of two main parts: + - **The Train Loop** - iterate over the training dataset and try to converge to optimal parameters. + - **The Validation/Test Loop** - iterate over the test dataset to check if model performance is improving. + +Let's briefly familiarize ourselves with some of the concepts used in the training loop. Jump ahead to +see the `full-impl-label` of the optimization loop. + +### Loss Function + +When presented with some training data, our untrained network is likely not to give the correct +answer. **Loss function** measures the degree of dissimilarity of obtained result to the target value, +and it is the loss function that we want to minimize during training. To calculate the loss we make a +prediction using the inputs of our given data sample and compare it against the true data label value. + +Common loss functions include [nn.MSELoss](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) (Mean Square Error) for regression tasks, and +[nn.NLLLoss](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss) (Negative Log Likelihood) for classification. +[nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss) combines ``nn.LogSoftmax`` and ``nn.NLLLoss``. + +We pass our model's output logits to ``nn.CrossEntropyLoss``, which will normalize the logits and compute the prediction error. + + + + +```python +# Initialize the loss function +loss_fn = nn.CrossEntropyLoss() +``` + +### Optimizer + +Optimization is the process of adjusting model parameters to reduce model error in each training step. **Optimization algorithms** define how this process is performed (in this example we use Stochastic Gradient Descent). +All optimization logic is encapsulated in the ``optimizer`` object. Here, we use the SGD optimizer; additionally, there are many [different optimizers](https://pytorch.org/docs/stable/optim.html) +available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data. + +We initialize the optimizer by registering the model's parameters that need to be trained, and passing in the learning rate hyperparameter. + + + + +```python +optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) +``` + +Inside the training loop, optimization happens in three steps: + * Call ``optimizer.zero_grad()`` to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration. + * Backpropagate the prediction loss with a call to ``loss.backward()``. PyTorch deposits the gradients of the loss w.r.t. each parameter. + * Once we have our gradients, we call ``optimizer.step()`` to adjust the parameters by the gradients collected in the backward pass. + + + + +## Full Implementation +We define ``train_loop`` that loops over our optimization code, and ``test_loop`` that +evaluates the model's performance against our test data. + + + + +```python +def train_loop(dataloader, model, loss_fn, optimizer): + size = len(dataloader.dataset) + for batch, (X, y) in enumerate(dataloader): + # Compute prediction and loss + pred = model(X) + loss = loss_fn(pred, y) + + # Backpropagation + optimizer.zero_grad() + loss.backward() + optimizer.step() + + if batch % 100 == 0: + loss, current = loss.item(), (batch + 1) * len(X) + print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]") + + +def test_loop(dataloader, model, loss_fn): + size = len(dataloader.dataset) + num_batches = len(dataloader) + test_loss, correct = 0, 0 + + with torch.no_grad(): + for X, y in dataloader: + pred = model(X) + test_loss += loss_fn(pred, y).item() + correct += (pred.argmax(1) == y).type(torch.float).sum().item() + + test_loss /= num_batches + correct /= size + print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n") +``` + +We initialize the loss function and optimizer, and pass it to ``train_loop`` and ``test_loop``. +Feel free to increase the number of epochs to track the model's improving performance. + + + + +```python +loss_fn = nn.CrossEntropyLoss() +optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) + +epochs = 10 +for t in range(epochs): + print(f"Epoch {t+1}\n-------------------------------") + train_loop(train_dataloader, model, loss_fn, optimizer) + test_loop(test_dataloader, model, loss_fn) +print("Done!") +``` + + Epoch 1 + ------------------------------- + loss: 2.310308 [ 64/60000] + loss: 2.291682 [ 6464/60000] + loss: 2.282847 [12864/60000] + loss: 2.278148 [19264/60000] + loss: 2.259573 [25664/60000] + loss: 2.246842 [32064/60000] + loss: 2.237948 [38464/60000] + loss: 2.221490 [44864/60000] + loss: 2.215676 [51264/60000] + loss: 2.186174 [57664/60000] + Test Error: + Accuracy: 50.1%, Avg loss: 2.185173 + + Epoch 2 + ------------------------------- + loss: 2.192464 [ 64/60000] + loss: 2.176265 [ 6464/60000] + loss: 2.138019 [12864/60000] + loss: 2.155484 [19264/60000] + loss: 2.096774 [25664/60000] + loss: 2.064352 [32064/60000] + loss: 2.073422 [38464/60000] + loss: 2.019561 [44864/60000] + loss: 2.018754 [51264/60000] + loss: 1.944076 [57664/60000] + Test Error: + Accuracy: 56.9%, Avg loss: 1.951974 + + Epoch 3 + ------------------------------- + loss: 1.979550 [ 64/60000] + loss: 1.944613 [ 6464/60000] + loss: 1.850896 [12864/60000] + loss: 1.885921 [19264/60000] + loss: 1.766024 [25664/60000] + loss: 1.721881 [32064/60000] + loss: 1.732149 [38464/60000] + loss: 1.646069 [44864/60000] + loss: 1.663508 [51264/60000] + loss: 1.542335 [57664/60000] + Test Error: + Accuracy: 60.8%, Avg loss: 1.575167 + + Epoch 4 + ------------------------------- + loss: 1.641383 [ 64/60000] + loss: 1.597785 [ 6464/60000] + loss: 1.460881 [12864/60000] + loss: 1.522893 [19264/60000] + loss: 1.394849 [25664/60000] + loss: 1.381750 [32064/60000] + loss: 1.389999 [38464/60000] + loss: 1.324359 [44864/60000] + loss: 1.359623 [51264/60000] + loss: 1.242349 [57664/60000] + Test Error: + Accuracy: 63.2%, Avg loss: 1.281596 + + Epoch 5 + ------------------------------- + loss: 1.364956 [ 64/60000] + loss: 1.337699 [ 6464/60000] + loss: 1.179997 [12864/60000] + loss: 1.276043 [19264/60000] + loss: 1.145318 [25664/60000] + loss: 1.163051 [32064/60000] + loss: 1.179221 [38464/60000] + loss: 1.127842 [44864/60000] + loss: 1.170320 [51264/60000] + loss: 1.072596 [57664/60000] + Test Error: + Accuracy: 64.8%, Avg loss: 1.102368 + + Epoch 6 + ------------------------------- + loss: 1.181124 [ 64/60000] + loss: 1.175671 [ 6464/60000] + loss: 0.999543 [12864/60000] + loss: 1.125861 [19264/60000] + loss: 0.994338 [25664/60000] + loss: 1.020635 [32064/60000] + loss: 1.052101 [38464/60000] + loss: 1.005876 [44864/60000] + loss: 1.050259 [51264/60000] + loss: 0.969423 [57664/60000] + Test Error: + Accuracy: 65.8%, Avg loss: 0.989962 + + Epoch 7 + ------------------------------- + loss: 1.055653 [ 64/60000] + loss: 1.073796 [ 6464/60000] + loss: 0.878792 [12864/60000] + loss: 1.027988 [19264/60000] + loss: 0.902191 [25664/60000] + loss: 0.923560 [32064/60000] + loss: 0.970771 [38464/60000] + loss: 0.927402 [44864/60000] + loss: 0.969056 [51264/60000] + loss: 0.901827 [57664/60000] + Test Error: + Accuracy: 66.8%, Avg loss: 0.914991 + + Epoch 8 + ------------------------------- + loss: 0.964512 [ 64/60000] + loss: 1.004631 [ 6464/60000] + loss: 0.793878 [12864/60000] + loss: 0.959500 [19264/60000] + loss: 0.842306 [25664/60000] + loss: 0.854395 [32064/60000] + loss: 0.914801 [38464/60000] + loss: 0.875149 [44864/60000] + loss: 0.910963 [51264/60000] + loss: 0.853945 [57664/60000] + Test Error: + Accuracy: 67.8%, Avg loss: 0.861828 + + Epoch 9 + ------------------------------- + loss: 0.895530 [ 64/60000] + loss: 0.953656 [ 6464/60000] + loss: 0.731293 [12864/60000] + loss: 0.908750 [19264/60000] + loss: 0.800252 [25664/60000] + loss: 0.803487 [32064/60000] + loss: 0.873069 [38464/60000] + loss: 0.838708 [44864/60000] + loss: 0.867891 [51264/60000] + loss: 0.817475 [57664/60000] + Test Error: + Accuracy: 68.9%, Avg loss: 0.821918 + + Epoch 10 + ------------------------------- + loss: 0.841097 [ 64/60000] + loss: 0.913210 [ 6464/60000] + loss: 0.683007 [12864/60000] + loss: 0.869649 [19264/60000] + loss: 0.768555 [25664/60000] + loss: 0.764901 [32064/60000] + loss: 0.839639 [38464/60000] + loss: 0.811697 [44864/60000] + loss: 0.834432 [51264/60000] + loss: 0.788075 [57664/60000] + Test Error: + Accuracy: 70.1%, Avg loss: 0.790321 + + Done! + + +## Further Reading +- [Loss Functions](https://pytorch.org/docs/stable/nn.html#loss-functions) +- [torch.optim](https://pytorch.org/docs/stable/optim.html) +- [Warmstart Training a Model](https://pytorch.org/tutorials/recipes/recipes/warmstarting_model_using_parameters_from_a_different_model.html) + + + diff --git a/docs/09-SaveLoad.md b/docs/09-SaveLoad.md new file mode 100644 index 0000000..d07b56e --- /dev/null +++ b/docs/09-SaveLoad.md @@ -0,0 +1,141 @@ +```python +%matplotlib inline +``` + + +[Learn the Basics](intro.html) || +[Quickstart](quickstart_tutorial.html) || +[Tensors](tensorqs_tutorial.html) || +[Datasets & DataLoaders](data_tutorial.html) || +[Transforms](transforms_tutorial.html) || +[Build Model](buildmodel_tutorial.html) || +[Autograd](autogradqs_tutorial.html) || +[Optimization](optimization_tutorial.html) || +**Save & Load Model** + +# Save and Load the Model + +In this section we will look at how to persist model state with saving, loading and running model predictions. + + + +```python +import torch +import torchvision.models as models +``` + +## Saving and Loading Model Weights +PyTorch models store the learned parameters in an internal +state dictionary, called ``state_dict``. These can be persisted via the ``torch.save`` +method: + + + + +```python +model = models.vgg16(pretrained=True) +torch.save(model.state_dict(), 'model_weights.pth') +``` + + /Users/brianjo/anaconda3/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. + warnings.warn( + /Users/brianjo/anaconda3/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights. + warnings.warn(msg) + + +To load model weights, you need to create an instance of the same model first, and then load the parameters +using ``load_state_dict()`` method. + + + + +```python +model = models.vgg16() # we do not specify pretrained=True, i.e. do not load default weights +model.load_state_dict(torch.load('model_weights.pth')) +model.eval() +``` + + + + + VGG( + (features): Sequential( + (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) + (1): ReLU(inplace=True) + (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) + (3): ReLU(inplace=True) + (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) + (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) + (6): ReLU(inplace=True) + (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) + (8): ReLU(inplace=True) + (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) + (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) + (11): ReLU(inplace=True) + (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) + (13): ReLU(inplace=True) + (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) + (15): ReLU(inplace=True) + (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) + (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) + (18): ReLU(inplace=True) + (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) + (20): ReLU(inplace=True) + (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) + (22): ReLU(inplace=True) + (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) + (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) + (25): ReLU(inplace=True) + (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) + (27): ReLU(inplace=True) + (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) + (29): ReLU(inplace=True) + (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) + ) + (avgpool): AdaptiveAvgPool2d(output_size=(7, 7)) + (classifier): Sequential( + (0): Linear(in_features=25088, out_features=4096, bias=True) + (1): ReLU(inplace=True) + (2): Dropout(p=0.5, inplace=False) + (3): Linear(in_features=4096, out_features=4096, bias=True) + (4): ReLU(inplace=True) + (5): Dropout(p=0.5, inplace=False) + (6): Linear(in_features=4096, out_features=1000, bias=True) + ) + ) + + + +

Note

be sure to call ``model.eval()`` method before inferencing to set the dropout and batch normalization layers to evaluation mode. Failing to do this will yield inconsistent inference results.

+ + + +## Saving and Loading Models with Shapes +When loading model weights, we needed to instantiate the model class first, because the class +defines the structure of a network. We might want to save the structure of this class together with +the model, in which case we can pass ``model`` (and not ``model.state_dict()``) to the saving function: + + + + +```python +torch.save(model, 'model.pth') +``` + +We can then load the model like this: + + + + +```python +model = torch.load('model.pth') +``` + +

Note

This approach uses Python [pickle](https://docs.python.org/3/library/pickle.html) module when serializing the model, thus it relies on the actual class definition to be available when loading the model.

+ + + +## Related Tutorials +[Saving and Loading a General Checkpoint in PyTorch](https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html) + + diff --git a/docs/docs/04-Data_21_1.png b/docs/docs/04-Data_21_1.png new file mode 100644 index 0000000..01d7edb Binary files /dev/null and b/docs/docs/04-Data_21_1.png differ diff --git a/docs/docs/04-Data_6_0.png b/docs/docs/04-Data_6_0.png new file mode 100644 index 0000000..5f07fe2 Binary files /dev/null and b/docs/docs/04-Data_6_0.png differ diff --git a/tutorials/01-Introduction.ipynb b/tutorials/01-Introduction.ipynb index 751f83d..8fcbf9f 100644 --- a/tutorials/01-Introduction.ipynb +++ b/tutorials/01-Introduction.ipynb @@ -1,6 +1,7 @@ { "cells": [ { + "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -46,13 +47,7 @@ "If you're familiar with other deep learning frameworks, check out the [0. Quickstart](quickstart_tutorial.html) first\n", "to quickly familiarize yourself with PyTorch's API.\n", "\n", - "If you're new to deep learning frameworks, head right into the first section of our step-by-step guide: [1. Tensors](tensor_tutorial.html).\n", - "\n", - "\n", - ".. include:: /beginner_source/basics/qs_toc.txt\n", - "\n", - ".. toctree::\n", - " :hidden:\n" + "If you're new to deep learning frameworks, head right into the first section of our step-by-step guide: [1. Tensors](tensor_tutorial.html).\n" ] } ],