Page22/22
Model Calibration & Probability Estimates · Page 1 of 1
Why Calibration Matters
Model Calibration
The Problem: Overconfident Predictions
A logistic regression model predicts:
- "This email is 95% likely spam"
- But in practice, 95% of emails it marks as spam are actually spam? NO — maybe only 80% are!
This model is miscalibrated (overconfident).
Well-Calibrated vs Miscalibrated
Well-Calibrated Model
- Predicts 0.7 probability → 70% of those samples are actually positive
- Predicts 0.5 probability → 50% of those samples are actually positive
- Reliability diagram: Points lie on diagonal
Miscalibrated Model (Overconfident)
- Predicts 0.9 probability → Only 70% are actually positive
- Predictions too extreme (too close to 0 or 1)
Reliability Diagram
Plot predicted probability vs actual frequency:
- Bin predictions into 10 buckets (0-10%, 10-20%, ... 90-100%)
- For each bucket, calculate actual positive rate
- Plot predicted vs actual
- If on diagonal (y=x) → Well-calibrated
- If above diagonal → Underconfident
- If below diagonal → Overconfident
Calibration Methods
1. Platt Scaling (Simple)
Fit a logistic regression on model outputs:
from sklearn.calibration import CalibratedClassifierCV
calibrated = CalibratedClassifierCV(model, method='sigmoid', cv=5)
calibrated.fit(X_train, y_train)
proba_calibrated = calibrated.predict_proba(X_test)
2. Isotonic Regression (Flexible)
Map any probabilities to calibrated probabilities (more flexible than Platt).
calibrated = CalibratedClassifierCV(model, method='isotonic', cv=5)
3. Temperature Scaling (Neural Networks)
Scale confidence by learning a temperature parameter.
proba_scaled = softmax(logits / temperature)
# temperature < 1: More confident
# temperature > 1: Less confident
Why Some Models are Miscalibrated
| Model | Calibration |
|---|---|
| Logistic Regression | Good (by design) |
| Neural Networks | Poor (overconfident) |
| Tree Models (RF, XGBoost) | Poor (extreme probabilities) |
| SVM | Very Poor |
| Naive Bayes | Generally Good |
Why NNs are Overconfident:
Deep learning models are trained to minimize loss, not to be calibrated. They output extreme probabilities (0.01, 0.99) because that minimizes loss faster.
main.py
Loading...
OUTPUT
▶Click "Run Code" to execute…