🧠

Module

Deep Learning & Neural Networks

Progress55%

11 / 20 pages

Lesson 1: Neurons & Perceptrons — Building Blocks

Lesson 2: Forward & Backpropagation — How Networks Learn

Lesson 3: Loss Functions & Optimization (Adam, SGD)

Lesson 4: Tokenization, Word Embeddings & Word2Vec

Lesson 5: Convolutional Neural Networks (CNN) — Image Processing

Lesson 6: Recurrent Neural Networks (RNN, LSTM, GRU)

Lesson 7: Attention Mechanisms & Transformers

Lesson 8: Generative Adversarial Networks (GAN)

Lesson 9: Weight Initialization, Regularization & Dropout

Lesson 10: Transfer Learning & Model Deployment

Back to Module Overview

Page11/20

Recurrent Neural Networks (RNN, LSTM, GRU) · Page 1 of 2

RNN Basics & Hidden State

Recurrent Neural Networks

Why RNNs for Sequences?

Some data is sequential:

Text: "The cat sat on the mat" (words in order matter!)
Time series: Stock prices over time
Audio: Sound waves (time-dependent)

Dense networks treat all inputs as independent. RNNs remember!

Dense: [word1, word2, word3] → No memory of previous
RNN:   [word1] → hidden state → [word2] → hidden state → [word3]
                    (remembers!)

The RNN Equation

hidden_state[t] = activation(W × input[t] + U × hidden_state[t-1] + b)
output[t] = V × hidden_state[t]

Where:

W: Input-to-hidden weights
U: Hidden-to-hidden weights (recurrent!)
V: Hidden-to-output weights
hidden_state[t-1]: Memory from previous step

Example: Text Generation

Training: "The cat sat"

Step 1: Input="The" → hidden = [0.3, -0.2, 0.5]
Step 2: Input="cat", hidden_prev=[0.3, -0.2, 0.5] → hidden = [0.1, 0.4, -0.3]
Step 3: Input="sat", hidden_prev=[0.1, 0.4, -0.3] → hidden = [-0.2, 0.5, 0.1]

Each step "remembers" the previous words!

Backpropagation Through Time (BPTT)

Training RNNs is tricky:

dL/dU must flow backward through many steps:
dL/dU = dL/dh[t] × dh[t]/dh[t-1] × dh[t-1]/dh[t-2] × ... × dh[1]/dU
        └─────────┬─────────┬─────────────────┬─────────────────┘
         Chain rule through many steps!

Problem: If gradients < 1, they vanish (→ 0)
         If gradients > 1, they explode (→ ∞)

This is the vanishing gradient problem.

The Problem & The Solution

Vanishing Gradients:
- Can't learn long-term dependencies
- Network forgets early inputs

Example: "The cat, which was ... sat"
- Can the network remember "cat" was the subject?
- With vanishing gradients: NO

Solution: LSTM & GRU (gate mechanisms to control information flow)

main.py

OUTPUT

▶Click "Run Code" to execute…