21/22
Causal Inference & A/B Testing · Page 1 of 1

Correlation ≠ Causation

Causal Inference & A/B Testing

The Fundamental Problem

Correlation: A and B move together Causation: A causes B

Classic Examples of Spurious Correlation:

  • Ice cream sales ↔ Drowning deaths (both increase in summer)
  • Shoe size ↔ Reading ability (both increase with age)
  • Nicolas Cage movies ↔ Swimming pool drownings per year (actually correlated!)

Observational Data vs Experiments

Observational Data (Correlational)

"Users who clicked the blue button → Higher conversion"
Question: Does blue cause higher conversion, or do engaged users click blue buttons?
Answer: We don't know (confounding variables)!

Randomized Controlled Trial (RCT / A/B Test)

1. Randomly assign users to 2 groups
2. Show BLUE button to Group A
3. Show RED button to Group B
4. Measure conversions
5. If Group A > Group B → BLUE causes higher conversion

Why randomization works:

  • Removes confounding variables (age, income, etc. distributed equally)
  • Only difference is button color
  • Any difference is caused by button color

Confounding Variables

A variable that affects both your treatment and outcome.

Example: Shoe Size → Reading Ability

Reality:
  Age → Shoe Size (bigger shoes for older kids)
  Age → Reading Ability (older kids read better)
  
Apparent: Shoe Size → Reading Ability (correlation)
Actual: Age confounds the relationship

Fix: Randomization or Control

  • Randomize shoe sizes (kids wear random-sized shoes) → Eliminates age effect
  • Or: Measure age and statistically control for it

A/B Test Design

Power Analysis (How many users needed?)

With small sample sizes, you might miss real effects (Type II error).

Conversion rates:
  Control: 2%
  Treatment: 2.5%
  Difference: 0.5 percentage points
  
Sample size needed: ~3,000 users per group
(More precision needed = larger samples)

Multiple Comparisons Problem

If you test 20 A/B tests, 5% false positive rate → 1 will be "significant" by chance!

Fix: Use Bonferroni correction or False Discovery Rate (FDR)

α_corrected = α / number_of_tests
# If 20 tests: use α = 0.05/20 = 0.0025 instead of 0.05

Duration & Seasonality

Test must run long enough to capture seasonality.

  • Website usage differs weekday vs weekend
  • Min duration: 1 week (to capture both)
  • Better: 2-4 weeks

Statistical Significance ≠ Practical Significance

Test might show a button color change improves conversion by 0.01%.

  • Statistically significant? (with enough users, yes)
  • Practically significant? (worth engineering time, no)

Practical significance = business impact

main.py
Loading...
OUTPUT
Click "Run Code" to execute…