Milan Ghimire

Learning PyTorch: Tensors, Autograd & the Training Loop

My notes on PyTorch - tensors as NumPy-with-gradients, how autograd builds the backward pass for you, and the five steps every training loop repeats.

Why PyTorch

PyTorch is the framework I use to actually build and train neural networks. The mental model that helped most: a PyTorch tensor is like a NumPy array, but with two superpowers - it can live on a GPU, and it can track the operations done to it so gradients can be computed automatically.

Tensors

import torch

x = torch.tensor([1.0, 2.0, 3.0])
y = torch.zeros((2, 3))
z = torch.randn((3, 3))      # random normal

x.shape                       # torch.Size([3])
x.to("cuda")                  # move to GPU (if available)

Most NumPy habits carry straight over - indexing, reshaping with .view(...), broadcasting. That overlap made PyTorch much less intimidating.

Autograd: gradients for free

This is the magic. If a tensor has requires_grad=True, PyTorch records every operation into a graph, and calling .backward() computes the gradient of the output with respect to every input.

w = torch.tensor(2.0, requires_grad=True)
loss = (w - 5) ** 2      # some function of w
loss.backward()          # compute d(loss)/dw
print(w.grad)            # tensor(-6.)  -> slope at w=2

I never have to derive the backward pass by hand - autograd does the calculus.

nn.Module: packaging a model

Models are Python classes that subclass nn.Module. You declare the layers in __init__ and describe the data flow in forward.

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        return self.fc2(x)

The training loop: five steps, always the same

Once this clicked, every PyTorch project looked familiar. Each batch repeats:

optimizer.zero_grad()        # 1. clear old gradients
output = model(data)         # 2. forward pass
loss = loss_fn(output, target)  # 3. measure the error
loss.backward()              # 4. backpropagate (autograd)
optimizer.step()             # 5. update the weights

zero_grad → forward → loss → backward → step. That rhythm is the heart of training any model, from a tiny MNIST classifier to something huge.

Where this connects

This is exactly what I used in my handwriting recognition project

  • the same tensors, nn.Module, and five-step loop, just with a CNN.

A living note - I expand it as I learn more of PyTorch.