Page15/22
Advanced Ensemble Methods · Page 1 of 1
Beyond Bagging & Boosting
Advanced Ensemble Methods
Voting Classifier
Combine multiple different algorithms:
from sklearn.ensemble import VotingClassifier
models = [
('lr', LogisticRegression()),
('rf', RandomForestClassifier()),
('xgb', XGBClassifier())
]
ensemble = VotingClassifier(estimators=models, voting='hard')
ensemble.fit(X_train, y_train)
Hard Voting: Majority vote (1 if 2 out of 3 models say 1) Soft Voting: Average probabilities (if models output 0.8, 0.6, 0.9 → average 0.77)
Pros: Simple, different models capture different patterns Cons: Requires training many models
Stacking (Meta-Learning)
- Level 0: Train multiple diverse models on training data
- Level 1: Use predictions from Level 0 as input to a meta-learner
- Final: Meta-learner makes the final prediction
from sklearn.ensemble import StackingClassifier
base_learners = [
('lr', LogisticRegression()),
('rf', RandomForestClassifier()),
('svm', SVC(probability=True))
]
meta_learner = LogisticRegression()
stacking = StackingClassifier(
estimators=base_learners,
final_estimator=meta_learner,
cv=5
)
Why Stacking Wins:
- Level 0 models learn raw patterns
- Meta-learner learns which models to trust
- Example: RF works well on some features, LR on others → Meta-learner learns to combine them
Blending
Like stacking, but simpler:
- Split training data: 60% train, 40% validation
- Train diverse models on 60%
- Get predictions on 40% validation set
- Use validation predictions as meta-features
- Train meta-learner
Pros: Faster (no CV needed in meta-training) Cons: Uses less data for base learner training
When to Use Each:
| Method | Speed | Performance | Use Case |
|---|---|---|---|
| Voting | Fast | Good | Quick ensemble, different models available |
| Stacking | Slow | Very Good | Production, high accuracy needed |
| Blending | Medium | Good | Competition, limited time |
main.py
Loading...
OUTPUT
▶Click "Run Code" to execute…