🤖

Module

Machine Learning Fundamentals

Progress65%

13 / 20 pages

Lesson 1: What is Machine Learning?

Lesson 2: Linear Regression from Scratch

Lesson 3: Visualizing the Loss Landscape

Lesson 4: Logistic Regression (Classification)

Lesson 5: K-Nearest Neighbors (Distance)

Lesson 6: Evaluation Metrics (From Scratch)

Lesson 7: Unsupervised Learning & K-Means

Lesson 8: Dimensionality Reduction with PCA

Lesson 9: Decision Trees & Splits

Lesson 10: Regularization (L1 & L2)

Lesson 11: K-Fold Cross Validation

Lesson 12: Naive Bayes — Probabilistic Classifier

Lesson 13: Support Vector Machines (SVM)

Lesson 14: Gradient Boosting & AdaBoost

Lesson 15: DBSCAN — Density-Based Clustering

Lesson 16: Gaussian Mixture Models (GMM)

Lesson 17: Ensemble Methods — Combine Multiple Models

Back to Module Overview

Page13/20

Regularization (L1 & L2) · Page 1 of 1

The Overfitting Cure

Regularization

The Problem

If you have 10 features, a model might give massive weights to useless features just because it perfectly fits the training noise. This fails on test data.

The Solution: Penalize Complexity

We add the size of the weights to the Loss Function. The model now has to balance "fit the data well" with "keep weights small".

L2 Regularization (Ridge)

Adds the squared weights to the loss: Loss = MSE + λ * Σ(weight²)

Result: Weights shrink smoothly towards zero, but rarely hit exactly zero.

L1 Regularization (Lasso)

Adds the absolute weights to the loss: Loss = MSE + λ * Σ(|weight|)

Result: Weights hit exactly zero! Lasso acts as automatic feature selection.

Implementing in Gradient Descent

Instead of: weight -= lr * gradient

Ridge: weight -= lr * (gradient + (2 * lambda * weight))
Lasso: weight -= lr * (gradient + (lambda * sign(weight)))

main.py

OUTPUT

▶Click "Run Code" to execute…