15/20
Naive Bayes β€” Probabilistic Classifier Β· Page 1 of 1

Bayes' Theorem & Independence Assumption

Naive Bayes Classification

Bayes' Theorem

P(Class | Features) = P(Features | Class) * P(Class) / P(Features)

English: Probability of class given features = likelihood Γ— prior / evidence

Components:

  • Prior P(Class): How common is this class? (Before seeing features)
  • Likelihood P(Features | Class): If this class is true, how likely are these features?
  • Evidence P(Features): How likely is this feature combination overall?

Example: Email Spam Detection

Email has word "FREE" β†’ Is it spam?

  • Prior: 20% of emails are spam
  • Likelihood: 50% of spam emails contain "FREE"
  • Likelihood (normal): 5% of normal emails contain "FREE"
  • Result: With "FREE", probability of spam increases significantly

The "Naive" Assumption

Naive Bayes assumes all features are independent given the class.

P(word1, word2, word3 | spam) = P(word1|spam) Γ— P(word2|spam) Γ— P(word3|spam)

This is often wrong (words are correlated), but it works well in practice!

Why?

  • Simplification makes computation fast
  • Despite false assumption, predictions are often accurate
  • Great for high-dimensional data (text)

Advantages & Disadvantages

Pros:

  • βœ“ Fast to train (just counting)
  • βœ“ Fast to predict
  • βœ“ Works well with high-dimensional data (text, images)
  • βœ“ Requires little training data
  • βœ“ Interpretable (see which features matter)

Cons:

  • βœ— Independence assumption often violated
  • βœ— Poor probability estimates (miscalibrated)
  • βœ— Can struggle with rare features
main.py
Loading...
OUTPUT
β–ΆClick "Run Code" to execute…