Page13/22
Feature Engineering — Create Better Features · Page 2 of 2
Feature Selection
Feature Selection: Keep Only the Good Features
Too many features cause:
- Overfitting (model memorizes noise)
- Slow training
- Curse of dimensionality
Methods
1. Univariate Selection (Filter)
Calculate correlation between each feature and target:
from sklearn.feature_selection import SelectKBest, chi2
selector = SelectKBest(chi2, k=10) # Keep top 10 features
X_selected = selector.fit_transform(X, y)
2. Tree-Based Feature Importance
Trees tell you which features matter:
model = RandomForestClassifier()
model.fit(X, y)
importances = model.feature_importances_
# Drop features with importance < 0.01
3. Recursive Feature Elimination (RFE)
Remove weakest features iteratively:
from sklearn.feature_selection import RFE
rfe = RFE(LogisticRegression(), n_features_to_select=10)
X_selected = rfe.fit_transform(X, y)
Best Practice:
- Start with all features
- Train and note baseline performance
- Drop weakest feature
- Retrain and compare
- Repeat until performance drops
main.py
Loading...
OUTPUT
▶Click "Run Code" to execute…