11/20
Recurrent Neural Networks (RNN, LSTM, GRU) Β· Page 1 of 2

RNN Basics & Hidden State

Recurrent Neural Networks

Why RNNs for Sequences?

Some data is sequential:

  • Text: "The cat sat on the mat" (words in order matter!)
  • Time series: Stock prices over time
  • Audio: Sound waves (time-dependent)

Dense networks treat all inputs as independent. RNNs remember!

Dense: [word1, word2, word3] β†’ No memory of previous
RNN:   [word1] β†’ hidden state β†’ [word2] β†’ hidden state β†’ [word3]
                    (remembers!)

The RNN Equation

hidden_state[t] = activation(W Γ— input[t] + U Γ— hidden_state[t-1] + b)
output[t] = V Γ— hidden_state[t]

Where:

  • W: Input-to-hidden weights
  • U: Hidden-to-hidden weights (recurrent!)
  • V: Hidden-to-output weights
  • hidden_state[t-1]: Memory from previous step

Example: Text Generation

Training: "The cat sat"

Step 1: Input="The" β†’ hidden = [0.3, -0.2, 0.5]
Step 2: Input="cat", hidden_prev=[0.3, -0.2, 0.5] β†’ hidden = [0.1, 0.4, -0.3]
Step 3: Input="sat", hidden_prev=[0.1, 0.4, -0.3] β†’ hidden = [-0.2, 0.5, 0.1]

Each step "remembers" the previous words!

Backpropagation Through Time (BPTT)

Training RNNs is tricky:

dL/dU must flow backward through many steps:
dL/dU = dL/dh[t] Γ— dh[t]/dh[t-1] Γ— dh[t-1]/dh[t-2] Γ— ... Γ— dh[1]/dU
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         Chain rule through many steps!

Problem: If gradients < 1, they vanish (β†’ 0)
         If gradients > 1, they explode (β†’ ∞)

This is the vanishing gradient problem.

The Problem & The Solution

Vanishing Gradients:
- Can't learn long-term dependencies
- Network forgets early inputs

Example: "The cat, which was ... sat"
- Can the network remember "cat" was the subject?
- With vanishing gradients: NO

Solution: LSTM & GRU (gate mechanisms to control information flow)
main.py
Loading...
OUTPUT
β–ΆClick "Run Code" to execute…