Skip to content

Day 1: Introduction to Transfer Learning

Welcome to Week 13. Today, we fundamentally alter how you approach Machine Learning.

Until now, when you built a Neural Network, you applied "Traditional Training." You initialized the network with completely random weights. The network was mathematically "blind." It took hundreds of epochs for the network to slowly learn how to see straight lines, then curves, then finally... a dog.

Transfer Learning skips the blindness.

The Concept of Transfer Learning

Transfer Learning is a technique where a model trained on a massive, generalized dataset (like the ImageNet dataset containing 14 million images) is repurposed for a totally unrelated, smaller task.

Because the model has already spent months on a supercomputer learning how to process information (extracting geometric shapes, shadows, or grammatical syntax), you don't need to teach it rudimentary features. You only need to teach it the final specific taxonomy of your custom task.

Freezing the Network

If we download a pre-trained model, we do not want our new, tiny dataset to accidentally trigger Backpropagation and permanently overwrite the brilliant mathematics Google already calculated!

To prevent this, we Freeze the layers. We mathematically lock the weights so they cannot be updated during training.

Hands-On: Freezing Models in PyTorch & TensorFlow

Look at day1_pt.py and day1_tf.py. We download ResNet50.

# day1_pt.py
import torch
import torchvision.models as models

# 1. Download the Pre-Trained Weights!
model = models.resnet50(pretrained=True)

# 2. FREEZE THE MATH! Prevent Gradients from updating the weights!
for param in model.parameters():
    param.requires_grad = False

# 3. CHOP OFF THE HEAD! (Modify the final layer for our Custom Task)
# ResNet50 natively outputs 1,000 classes. We only want 10 classes!
num_features = model.fc.in_features
model.fc = torch.nn.Linear(num_features, 10)

Look at how elegant this is. We downloaded a 50-layer behemoth ResNet50. We mathematically disabled its ability to learn using param.requires_grad = False (or layer.trainable = False in TensorFlow).

Finally, we hot-swapped its final Output Layer. Because this new Output Layer was just initialized, its weights are randomly generated, meaning it is allowed to learn!

Wrapping Up Day 1

Transfer learning dramatically reduces training time. You aren't training a 50-layer network. You are only training a 1-layer network (the final layer you just added!), utilizing the frozen 49 layers underneath it strictly as a flawless "Feature Extractor".

Tomorrow on Day 2: Transfer Learning in Computer Vision, we will take this frozen ResNet50 architecture, attach a robust custom Classification Head, and train it on a real dataset using both TensorFlow and PyTorch.