Page12/22
Feature Engineering — Create Better Features · Page 1 of 2
The Art & Science of Features
Feature Engineering
The Golden Rule of ML
"Garbage in, garbage out." A brilliant algorithm on bad features beats a mediocre algorithm on great features.
Types of Feature Engineering
1. Categorical Encoding
One-Hot Encoding
Convert categories to binary columns:
Color: Red → Red=1, Green=0, Blue=0
Color: Green → Red=0, Green=1, Blue=0
Label Encoding
Assign integers:
Red=0, Green=1, Blue=2
⚠️ Only use if ordinal (e.g., Education: Primary=0, Secondary=1, Tertiary=2)
Target Encoding
Replace with target mean:
If "Red" items have avg purchase=$100, replace Red→100
2. Numerical Transformations
Binning (Discretization)
Convert continuous to categorical:
Age 25 → "Young" (0-30)
Age 45 → "Middle" (31-60)
Age 75 → "Senior" (60+)
Polynomial Features
Create interactions:
x1, x2 → x1, x2, x1², x2², x1*x2
Log Transform
For skewed data:
income_log = log(income) # Reduces skewness
3. Domain-Specific Features
Extract knowledge from raw data:
- From timestamp: Extract day-of-week, hour, is_weekend
- From text: Word count, sentiment score
- From geographic: Distance to city center
main.py
Loading...
OUTPUT
▶Click "Run Code" to execute…