18/20
Weight Initialization, Regularization & Dropout Β· Page 2 of 2

Dropout & L1/L2 Regularization

Dropout (Simple but Effective)

Problem: Model memorizes training data (overfitting).

Solution: Randomly drop neurons during training!

Forward pass:
y = Dense(x)  (normal)

With dropout (p=0.5):
mask = random([0, 1])  (50% zeros)
y = Dense(x) * mask    (drop 50% of outputs)

Then scale: y = y / (1 - p)  (compensate for dropped units)

Test time: Use all neurons! No dropout.

Why it works:

  • Forces network to learn redundant features
  • Can't rely on single neuron
  • Ensemble effect (different neurons active each batch)

Typical dropout rates:

  • p=0.2-0.3 (light, 20-30% drop)
  • p=0.5 (standard)
  • p > 0.7 (heavy, for very large networks)

L1/L2 Regularization

Idea: Penalize large weights β†’ Force small, sparse weights.

L2 Regularization (Ridge)

Total Loss = Data Loss + Ξ» Γ— Ξ£(wΒ²)

Ξ» controls strength (hyperparameter)

Gradient: dL/dw = (normal gradient) + 2Ξ»w

Large w β†’ bigger penalty β†’ decay toward 0

Effect: All weights shrink uniformly.

L1 Regularization (Lasso)

Total Loss = Data Loss + Ξ» Γ— Ξ£(|w|)

Gradient: dL/dw = (normal gradient) + Ξ» Γ— sign(w)

Drives less-important weights to exactly 0!

Effect: Feature selection (some weights exactly 0).

Regularization Strength

Ξ» = 0:     No regularization (overfit)
Ξ» = 0.001: Light regularization (good balance)
Ξ» = 0.1:   Strong regularization (underfit)
Ξ» = 1.0:   Very strong (model too simple)

Tuning: Use validation set to find best Ξ».

Combining Techniques

Best practice:

Layer 1: Dense β†’ BatchNorm β†’ Activation β†’ Dropout
Layer 2: Dense β†’ BatchNorm β†’ Activation β†’ Dropout
...

When to Use

TechniqueUse ForStrength
DropoutLarge networksSimple, effective
L2 RegAll modelsStandard
L1 RegFeature selectionInterpretability
Batch NormDeep networksStabilizes training
Early stoppingGeneralPrevents overfitting

Early Stopping

Simplest regularization:

Train until validation loss stops improving
Stop and use that model

Why? Prevents overfitting!
main.py
Loading...
OUTPUT
β–ΆClick "Run Code" to execute…