13/20
Regularization (L1 & L2) Β· Page 1 of 1

The Overfitting Cure

Regularization

The Problem

If you have 10 features, a model might give massive weights to useless features just because it perfectly fits the training noise. This fails on test data.

The Solution: Penalize Complexity

We add the size of the weights to the Loss Function. The model now has to balance "fit the data well" with "keep weights small".

L2 Regularization (Ridge)

Adds the squared weights to the loss: Loss = MSE + Ξ» * Ξ£(weightΒ²)

  • Result: Weights shrink smoothly towards zero, but rarely hit exactly zero.

L1 Regularization (Lasso)

Adds the absolute weights to the loss: Loss = MSE + Ξ» * Ξ£(|weight|)

  • Result: Weights hit exactly zero! Lasso acts as automatic feature selection.

Implementing in Gradient Descent

Instead of: weight -= lr * gradient

  • Ridge: weight -= lr * (gradient + (2 * lambda * weight))
  • Lasso: weight -= lr * (gradient + (lambda * sign(weight)))
main.py
Loading...
OUTPUT
β–ΆClick "Run Code" to execute…