Page12/20
Recurrent Neural Networks (RNN, LSTM, GRU) Β· Page 2 of 2
LSTM (Long Short-Term Memory) & GRU
LSTM (Long Short-Term Memory)
Solution to vanishing gradients: Use gates to control information flow!
The LSTM Cell
Four gates control what gets remembered:
1. Forget Gate: f_t = sigmoid(W_f Γ [h_{t-1}, x_t] + b_f)
"Should I forget this info?"
2. Input Gate: i_t = sigmoid(W_i Γ [h_{t-1}, x_t] + b_i)
"Should I learn this new info?"
3. Candidate: CΜ_t = tanh(W_c Γ [h_{t-1}, x_t] + b_c)
"What new info should I learn?"
4. Output Gate: o_t = sigmoid(W_o Γ [h_{t-1}, x_t] + b_o)
"What info should I output?"
Cell state update:
C_t = f_t β C_{t-1} + i_t β CΜ_t (add new, forget old)
Hidden state:
h_t = o_t β tanh(C_t)
(β = element-wise multiplication)
Key insight: Cell state flows straight through, gradients don't vanish!
Example: Understanding Context
Sentence: "The cat, which was orange and fluffy, sat"
LSTM forgets irrelevant words (commas, adjectives)
Remembers "cat" as subject
Learns that "sat" is the verb about the cat
Forget gate: "Forget 'orange', 'fluffy', 'and'"
Input gate: "Remember 'cat'"
Output gate: "Output 'sat' is verb of 'cat'"
GRU (Gated Recurrent Unit)
Simpler than LSTM but similar performance.
Only 2 gates (vs LSTM's 4):
Reset gate: r_t = sigmoid(W_r Γ [h_{t-1}, x_t] + b_r)
Update gate: z_t = sigmoid(W_z Γ [h_{t-1}, x_t] + b_z)
hΜ_t = tanh(W Γ [r_t β h_{t-1}, x_t] + b)
h_t = (1 - z_t) β hΜ_t + z_t β h_{t-1}
Advantages of GRU:
- Fewer parameters (2 gates vs 4)
- Faster training
- Often similar performance to LSTM
- Good for smaller datasets
LSTM vs GRU vs RNN
| Model | Params | Speed | Long-term | Use Case |
|---|---|---|---|---|
| RNN | Few | Fast | Poor | Simple sequences |
| GRU | Medium | Medium | Good | Text, most tasks |
| LSTM | Many | Slow | Excellent | Complex sequences |
Modern practice:
- Default to GRU (good balance)
- Use LSTM for very long sequences
- Avoid vanilla RNN (vanishing gradients)
Bidirectional RNNs
Process sequence in both directions:
Forward RNN: β (left to right)
Backward RNN: β (right to left)
Concatenate: [forward_hidden, backward_hidden]
Advantage: Can look ahead!
Example: Sequence labeling, machine translation
main.py
Loading...
OUTPUT
βΆClick "Run Code" to executeβ¦