Page15/20
Naive Bayes β Probabilistic Classifier Β· Page 1 of 1
Bayes' Theorem & Independence Assumption
Naive Bayes Classification
Bayes' Theorem
P(Class | Features) = P(Features | Class) * P(Class) / P(Features)
English: Probability of class given features = likelihood Γ prior / evidence
Components:
- Prior P(Class): How common is this class? (Before seeing features)
- Likelihood P(Features | Class): If this class is true, how likely are these features?
- Evidence P(Features): How likely is this feature combination overall?
Example: Email Spam Detection
Email has word "FREE" β Is it spam?
- Prior: 20% of emails are spam
- Likelihood: 50% of spam emails contain "FREE"
- Likelihood (normal): 5% of normal emails contain "FREE"
- Result: With "FREE", probability of spam increases significantly
The "Naive" Assumption
Naive Bayes assumes all features are independent given the class.
P(word1, word2, word3 | spam) = P(word1|spam) Γ P(word2|spam) Γ P(word3|spam)
This is often wrong (words are correlated), but it works well in practice!
Why?
- Simplification makes computation fast
- Despite false assumption, predictions are often accurate
- Great for high-dimensional data (text)
Advantages & Disadvantages
Pros:
- β Fast to train (just counting)
- β Fast to predict
- β Works well with high-dimensional data (text, images)
- β Requires little training data
- β Interpretable (see which features matter)
Cons:
- β Independence assumption often violated
- β Poor probability estimates (miscalibrated)
- β Can struggle with rare features
main.py
Loading...
OUTPUT
βΆClick "Run Code" to executeβ¦