Day 1: Introduction to Convolutional Neural Networks
Welcome to Week 10. Last week, we used Dense Networks to classify images.
If we feed a Dense network a picture of a dog perfectly centered in the frame, it learns to look for "Dog Pixels" in the middle of the photograph.
If we show that exact same AI a picture of the same dog, but the dog is standing in the top-left corner of the frame... the Dense network will fail completely. It doesn't understand that the object moved. It just sees that the middle pixels are empty.
This is a lack of Translation Invariance.
The Convolutional Neural Network (CNN)
The CNN solves Translation Invariance. Instead of assigning a unique Weight to every single pixel permanently, a CNN uses a tiny sliding window (a "Kernel") that scans the entire image left-to-right, top-to-bottom.
Because the window scans everywhere, it doesn't matter if the dog is in the center, the top-left, or the bottom-right. The window will eventually slide over it and detect the mathematical "texture" of the dog!
The CNN Pipeline
Almost all Modern Computer Vision models follow a strict 3-Part pipeline: 1. Convolutional Layers: These slide over the image and extract raw localized features (Edges, Curves, Colors). 2. Pooling Layers: These downsize the image to save RAM and focus only on the brightest, most important features. 3. Fully Connected (Dense) Layers: After the image is heavily processed by Convolutions, it is finally passed to a traditional Dense layer which acts as the final "Voting Classifier".
Hands-On Let's Initialize!
Look at day1_ex.py. We prepare placeholders for both architectures!
# day1_ex.py
import tensorflow as tf
import torch.nn as nn
# 1. TensorFlow CNN Blueprint
tf_model = tf.keras.Sequential([
# The Sliding Window!
tf.keras.layers.Conv2D(32, (3, 3), activation="relu", input_shape=(32, 32, 3)),
# The Downsizing Layer!
tf.keras.layers.MaxPooling2D((2, 2)),
# ... [Flatten and Dense Classifiers]
])
# 2. PyTorch CNN Blueprint
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
# The Sliding Window!
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, activation='relu')
# The Downsizing Layer!
self.pool = nn.MaxPool2d(2, 2)
Wrapping Up Day 1
CNNs require drastically fewer Parameters than Dense networks because a sliding Kernel (which only contains 9 weights) is re-used thousands of times across the entire image! It is beautifully efficient.
But how does a set of 9 random numbers actually extract an "Edge" from a photograph?
Tomorrow on Day 2: Convolutional Layers and Filters, we drop down into the raw Numpy mathematics to manually trace an Edge-Detection matrix!