12/22
Feature Engineering — Create Better Features · Page 1 of 2

The Art & Science of Features

Feature Engineering

The Golden Rule of ML

"Garbage in, garbage out." A brilliant algorithm on bad features beats a mediocre algorithm on great features.

Types of Feature Engineering

1. Categorical Encoding

One-Hot Encoding

Convert categories to binary columns:

Color: Red  →  Red=1, Green=0, Blue=0
Color: Green →  Red=0, Green=1, Blue=0

Label Encoding

Assign integers:

Red=0, Green=1, Blue=2

⚠️ Only use if ordinal (e.g., Education: Primary=0, Secondary=1, Tertiary=2)

Target Encoding

Replace with target mean:

If "Red" items have avg purchase=$100, replace Red→100

2. Numerical Transformations

Binning (Discretization)

Convert continuous to categorical:

Age 25 → "Young" (0-30)
Age 45 → "Middle" (31-60)
Age 75 → "Senior" (60+)

Polynomial Features

Create interactions:

x1, x2 → x1, x2, x1², x2², x1*x2

Log Transform

For skewed data:

income_log = log(income)  # Reduces skewness

3. Domain-Specific Features

Extract knowledge from raw data:

  • From timestamp: Extract day-of-week, hour, is_weekend
  • From text: Word count, sentiment score
  • From geographic: Distance to city center
main.py
Loading...
OUTPUT
Click "Run Code" to execute…