Page11/20
Recurrent Neural Networks (RNN, LSTM, GRU) Β· Page 1 of 2
RNN Basics & Hidden State
Recurrent Neural Networks
Why RNNs for Sequences?
Some data is sequential:
- Text: "The cat sat on the mat" (words in order matter!)
- Time series: Stock prices over time
- Audio: Sound waves (time-dependent)
Dense networks treat all inputs as independent. RNNs remember!
Dense: [word1, word2, word3] β No memory of previous
RNN: [word1] β hidden state β [word2] β hidden state β [word3]
(remembers!)
The RNN Equation
hidden_state[t] = activation(W Γ input[t] + U Γ hidden_state[t-1] + b)
output[t] = V Γ hidden_state[t]
Where:
- W: Input-to-hidden weights
- U: Hidden-to-hidden weights (recurrent!)
- V: Hidden-to-output weights
- hidden_state[t-1]: Memory from previous step
Example: Text Generation
Training: "The cat sat"
Step 1: Input="The" β hidden = [0.3, -0.2, 0.5]
Step 2: Input="cat", hidden_prev=[0.3, -0.2, 0.5] β hidden = [0.1, 0.4, -0.3]
Step 3: Input="sat", hidden_prev=[0.1, 0.4, -0.3] β hidden = [-0.2, 0.5, 0.1]
Each step "remembers" the previous words!
Backpropagation Through Time (BPTT)
Training RNNs is tricky:
dL/dU must flow backward through many steps:
dL/dU = dL/dh[t] Γ dh[t]/dh[t-1] Γ dh[t-1]/dh[t-2] Γ ... Γ dh[1]/dU
βββββββββββ¬ββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ
Chain rule through many steps!
Problem: If gradients < 1, they vanish (β 0)
If gradients > 1, they explode (β β)
This is the vanishing gradient problem.
The Problem & The Solution
Vanishing Gradients:
- Can't learn long-term dependencies
- Network forgets early inputs
Example: "The cat, which was ... sat"
- Can the network remember "cat" was the subject?
- With vanishing gradients: NO
Solution: LSTM & GRU (gate mechanisms to control information flow)
main.py
Loading...
OUTPUT
βΆClick "Run Code" to executeβ¦