⚡

Module

Advanced ML & Model Interpretability

Progress95%

21 / 22 pages

Lesson 1: Advanced Evaluation Metrics

Lesson 2: Stratified K-Fold Cross-Validation

Lesson 3: SHAP (SHapley Additive exPlanations)

Lesson 4: LIME (Local Interpretable Model-agnostic Explanations)

Lesson 5: Data Distributions & Normality

Lesson 6: Feature Scaling & Normalization

Lesson 7: Handling Class Imbalance

Lesson 8: Hyperparameter Tuning (Grid & Random Search)

Lesson 9: Feature Engineering — Create Better Features

Lesson 10: XGBoost — The Best Algorithm

Lesson 11: Advanced Ensemble Methods

Lesson 12: Introduction to Neural Networks

Lesson 13: Model Deployment & Production

Lesson 14: Model Monitoring & Drift Detection

Lesson 15: ML Ethics & Fairness

Lesson 16: Time Series Basics

Lesson 17: Causal Inference & A/B Testing

Lesson 18: Model Calibration & Probability Estimates

Back to Module Overview

Page21/22

Causal Inference & A/B Testing · Page 1 of 1

Correlation ≠ Causation

Causal Inference & A/B Testing

The Fundamental Problem

Correlation: A and B move together Causation: A causes B

Classic Examples of Spurious Correlation:

Ice cream sales ↔ Drowning deaths (both increase in summer)
Shoe size ↔ Reading ability (both increase with age)
Nicolas Cage movies ↔ Swimming pool drownings per year (actually correlated!)

Observational Data vs Experiments

Observational Data (Correlational)

"Users who clicked the blue button → Higher conversion"
Question: Does blue cause higher conversion, or do engaged users click blue buttons?
Answer: We don't know (confounding variables)!

Randomized Controlled Trial (RCT / A/B Test)

1. Randomly assign users to 2 groups
2. Show BLUE button to Group A
3. Show RED button to Group B
4. Measure conversions
5. If Group A > Group B → BLUE causes higher conversion

Why randomization works:

Removes confounding variables (age, income, etc. distributed equally)
Only difference is button color
Any difference is caused by button color

Confounding Variables

A variable that affects both your treatment and outcome.

Example: Shoe Size → Reading Ability

Reality:
  Age → Shoe Size (bigger shoes for older kids)
  Age → Reading Ability (older kids read better)
  
Apparent: Shoe Size → Reading Ability (correlation)
Actual: Age confounds the relationship

Fix: Randomization or Control

Randomize shoe sizes (kids wear random-sized shoes) → Eliminates age effect
Or: Measure age and statistically control for it

A/B Test Design

Power Analysis (How many users needed?)

With small sample sizes, you might miss real effects (Type II error).

Conversion rates:
  Control: 2%
  Treatment: 2.5%
  Difference: 0.5 percentage points
  
Sample size needed: ~3,000 users per group
(More precision needed = larger samples)

Multiple Comparisons Problem

If you test 20 A/B tests, 5% false positive rate → 1 will be "significant" by chance!

Fix: Use Bonferroni correction or False Discovery Rate (FDR)

α_corrected = α / number_of_tests
# If 20 tests: use α = 0.05/20 = 0.0025 instead of 0.05

Duration & Seasonality

Test must run long enough to capture seasonality.

Website usage differs weekday vs weekend
Min duration: 1 week (to capture both)
Better: 2-4 weeks

Statistical Significance ≠ Practical Significance

Test might show a button color change improves conversion by 0.01%.

Statistically significant? (with enough users, yes)
Practically significant? (worth engineering time, no)

Practical significance = business impact

main.py

OUTPUT

▶Click "Run Code" to execute…