2/20
Neurons & Perceptrons β€” Building Blocks Β· Page 2 of 2

Activation Functions

Activation Functions (Non-Linearity)

Activation functions introduce non-linearity, allowing networks to learn complex patterns.

ReLU (Rectified Linear Unit) β€” Most Popular

f(z) = max(0, z)

Advantages:

  • Computationally efficient
  • Works great in practice
  • Sparse activation (many zeros)

Disadvantage:

  • Dead neurons (if w, b cause z < 0 always, neuron stops learning)

Sigmoid β€” Classic but Outdated

f(z) = 1 / (1 + e^(-z))

Output range: (0, 1)

Why it was used:

  • Smooth, differentiable
  • Output interpretable as probability

Why we moved away:

  • Vanishing gradients (near 0 or 1, gradient β‰ˆ 0, hard to learn)
  • Slower than ReLU

Tanh β€” Improved Sigmoid

f(z) = (e^z - e^(-z)) / (e^z + e^(-z))

Output range: (-1, 1)

Better than Sigmoid but still slower than ReLU.

Softmax β€” Multi-class Classification

f(zα΅’) = e^(zα΅’) / Ξ£β±Ό e^(zβ±Ό)

Converts raw scores to probabilities (sum to 1).

Example:

  • Raw output: [2.0, 1.0, 0.1]
  • After softmax: [0.7, 0.2, 0.1] ← probabilities!

When to Use

TaskLast LayerHidden Layers
Binary ClassificationSigmoidReLU
Multi-classSoftmaxReLU
RegressionLinearReLU
SequencesSigmoid/TanhReLU

Modern best practice: Use ReLU in hidden layers, specific activation for output.

main.py
Loading...
OUTPUT
β–ΆClick "Run Code" to execute…