Convolutional Neural Networks (CNNs)
Why Not Dense Layers for Images?
Consider a 224x224 color image. Flattened, that's 224 x 224 x 3 = 150,528 input values. A single dense layer with 256 neurons would need:
> 150,528 x 256 = 38.5 million parameters — in just ONE layer!
This has three critical problems: 1. Parameter explosion — too many weights to train efficiently 2. No spatial awareness — a dense layer treats pixel (0,0) and pixel (223,223) identically 3. No translation invariance — a cat in the top-left corner looks completely different from a cat in the bottom-right
CNNs solve all three problems by using local connectivity (small filters), weight sharing (same filter slides across the image), and pooling (spatial invariance).
CNN Architecture: The Building Blocks
Conv2D — The Convolutional Layer
A convolutional layer slides small filters (e.g., 3x3) across the image. Each filter detects a specific pattern (edge, texture, shape) at every spatial location.
| Parameter | Meaning |
|---|---|
| filters | Number of different patterns to detect (e.g., 32) |
| kernel_size | Size of each filter (e.g., 3x3) |
| strides | How far the filter moves each step (default: 1) |
| padding | "same" keeps spatial dims; "valid" shrinks them |
MaxPooling2D — Spatial Down-sampling
Takes the maximum value in each window (e.g., 2x2), reducing spatial dimensions by half. This provides:
Flatten / GlobalAveragePooling2D — Bridge to Dense Layers
After conv layers extract spatial features, we need to convert the 3D feature maps into a 1D vector for classification. GlobalAveragePooling2D is preferred over Flatten because it dramatically reduces parameters.
Building a CNN from Scratch
Here's the classic CNN architecture pattern: stacks of (Conv -> Pool) followed by Dense layers for classification.
1import tensorflow as tf
2from tensorflow import keras
3from tensorflow.keras import layers
4
5# --- Load CIFAR-10 ---
6(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
7X_train = X_train.astype("float32") / 255.0
8X_test = X_test.astype("float32") / 255.0
9print(f"Train: {X_train.shape}, Test: {X_test.shape}")
10# Train: (50000, 32, 32, 3), Test: (10000, 32, 32, 3)
11
12# --- Build CNN ---
13model = keras.Sequential([
14 # Block 1: low-level features (edges, colors)
15 layers.Conv2D(32, (3, 3), activation="relu", padding="same",
16 input_shape=(32, 32, 3)),
17 layers.Conv2D(32, (3, 3), activation="relu", padding="same"),
18 layers.MaxPooling2D((2, 2)),
19 layers.Dropout(0.25),
20
21 # Block 2: mid-level features (textures, parts)
22 layers.Conv2D(64, (3, 3), activation="relu", padding="same"),
23 layers.Conv2D(64, (3, 3), activation="relu", padding="same"),
24 layers.MaxPooling2D((2, 2)),
25 layers.Dropout(0.25),
26
27 # Block 3: high-level features (object parts)
28 layers.Conv2D(128, (3, 3), activation="relu", padding="same"),
29 layers.Conv2D(128, (3, 3), activation="relu", padding="same"),
30 layers.MaxPooling2D((2, 2)),
31 layers.Dropout(0.25),
32
33 # Classification head
34 layers.GlobalAveragePooling2D(),
35 layers.Dense(128, activation="relu"),
36 layers.Dropout(0.5),
37 layers.Dense(10, activation="softmax"),
38])
39
40model.summary()
41# Much fewer parameters than a dense network on flattened images!
42
43model.compile(
44 optimizer="adam",
45 loss="sparse_categorical_crossentropy",
46 metrics=["accuracy"],
47)
48
49history = model.fit(
50 X_train, y_train,
51 epochs=20,
52 batch_size=64,
53 validation_split=0.1,
54 callbacks=[
55 keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True),
56 ],
57)
58
59test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
60print(f"Test accuracy: {test_acc:.4f}")Transfer Learning with Pre-Trained Models
Training a CNN from scratch requires large datasets and many GPU hours. Transfer learning lets you reuse a model trained on millions of images (ImageNet) and adapt it to your specific task — often with only a few hundred images.
The Strategy
1. Load a pre-trained base model (e.g., EfficientNetV2B0) and freeze its weights 2. Add a custom classification head on top 3. Train only the head for a few epochs (feature extraction) 4. Optionally unfreeze some base layers and fine-tune with a very low learning rate
1import tensorflow as tf
2from tensorflow import keras
3from tensorflow.keras import layers
4
5# --- Step 1: Load pre-trained base model ---
6base_model = keras.applications.EfficientNetV2B0(
7 weights="imagenet",
8 include_top=False, # Remove original classification head
9 input_shape=(224, 224, 3),
10)
11
12# Freeze ALL base model weights
13base_model.trainable = False
14
15# --- Step 2: Add custom classification head ---
16inputs = keras.Input(shape=(224, 224, 3))
17# EfficientNet has its own preprocessing built in
18x = base_model(inputs, training=False)
19x = layers.GlobalAveragePooling2D()(x)
20x = layers.Dropout(0.3)(x)
21x = layers.Dense(256, activation="relu")(x)
22x = layers.Dropout(0.3)(x)
23outputs = layers.Dense(5, activation="softmax")(x) # 5 classes
24
25model = keras.Model(inputs, outputs)
26
27# --- Step 3: Train the head (feature extraction) ---
28model.compile(
29 optimizer=keras.optimizers.Adam(learning_rate=1e-3),
30 loss="sparse_categorical_crossentropy",
31 metrics=["accuracy"],
32)
33
34# model.fit(train_data, epochs=10, validation_data=val_data)
35print("Phase 1: Training head only")
36print(f"Trainable parameters: {sum(p.numpy().size for p in model.trainable_variables):,}")
37
38# --- Step 4: Fine-tune the base model ---
39# Unfreeze the base model
40base_model.trainable = True
41
42# Re-compile with a MUCH lower learning rate
43# This is critical — high LR would destroy the pre-trained features
44model.compile(
45 optimizer=keras.optimizers.Adam(learning_rate=1e-5), # 100x smaller!
46 loss="sparse_categorical_crossentropy",
47 metrics=["accuracy"],
48)
49
50# model.fit(train_data, epochs=10, validation_data=val_data)
51print("\nPhase 2: Fine-tuning entire model")
52print(f"Trainable parameters: {sum(p.numpy().size for p in model.trainable_variables):,}")Transfer Learning Strategy
Choosing a Pre-Trained Model
| Model | Size | Top-1 Accuracy | Speed | Best For |
|---|---|---|---|---|
| MobileNetV2 | 14 MB | 71.8% | Very Fast | Mobile/edge deployment |
| EfficientNetV2B0 | 29 MB | 78.7% | Fast | Good balance of accuracy and speed |
| EfficientNetV2L | 478 MB | 85.7% | Slow | Maximum accuracy when resources allow |
| ResNet50 | 98 MB | 76.0% | Medium | Well-studied, reliable baseline |