Page2/20
Neurons & Perceptrons β Building Blocks Β· Page 2 of 2
Activation Functions
Activation Functions (Non-Linearity)
Activation functions introduce non-linearity, allowing networks to learn complex patterns.
ReLU (Rectified Linear Unit) β Most Popular
f(z) = max(0, z)
Advantages:
- Computationally efficient
- Works great in practice
- Sparse activation (many zeros)
Disadvantage:
- Dead neurons (if w, b cause z < 0 always, neuron stops learning)
Sigmoid β Classic but Outdated
f(z) = 1 / (1 + e^(-z))
Output range: (0, 1)
Why it was used:
- Smooth, differentiable
- Output interpretable as probability
Why we moved away:
- Vanishing gradients (near 0 or 1, gradient β 0, hard to learn)
- Slower than ReLU
Tanh β Improved Sigmoid
f(z) = (e^z - e^(-z)) / (e^z + e^(-z))
Output range: (-1, 1)
Better than Sigmoid but still slower than ReLU.
Softmax β Multi-class Classification
f(zα΅’) = e^(zα΅’) / Ξ£β±Ό e^(zβ±Ό)
Converts raw scores to probabilities (sum to 1).
Example:
- Raw output: [2.0, 1.0, 0.1]
- After softmax: [0.7, 0.2, 0.1] β probabilities!
When to Use
| Task | Last Layer | Hidden Layers |
|---|---|---|
| Binary Classification | Sigmoid | ReLU |
| Multi-class | Softmax | ReLU |
| Regression | Linear | ReLU |
| Sequences | Sigmoid/Tanh | ReLU |
Modern best practice: Use ReLU in hidden layers, specific activation for output.
main.py
Loading...
OUTPUT
βΆClick "Run Code" to executeβ¦