GRU (Gated Recurrent Unit)
🧠What is GRU?
GRU (Gated Recurrent Unit) is a type of Recurrent Neural Network (RNN) architecture introduced in 2014.
It was designed to solve the same problems as LSTM (Long Short-Term Memory) but with a simpler structure and fewer parameters.
🚀 Why GRU?
- Handles long-term dependencies.
- Mitigates vanishing gradient problem.
- Simpler & faster than LSTM.
- Performs well on sequence data like text, time series, audio.
🔑 Core Components of GRU:
At each time step t, GRU has:
-
Update Gate (
z_t):
Controls how much of the past information to keep. -
Reset Gate (
r_t):
Controls how much of the past information to forget. -
Candidate Activation (
h̃_t):
Computes new information to add. -
Final Hidden State (
h_t):
Combines old state and new candidate info.
⚙️ Mathematical Equations:
Given input x_t and previous hidden state h_{t-1}:
1. Update Gate (z_t):
z_t = σ(W_z * [h_{t-1}, x_t] + b_z)
Controls how much of the past to keep.
2. Reset Gate (r_t):
r_t = σ(W_r * [h_{t-1}, x_t] + b_r)
Controls how much past info to forget.
3. Candidate Activation (h̃_t):
h̃_t = tanh(W_h * [r_t * h_{t-1}, x_t] + b_h)
Uses reset gate → combines past + present info.
4. Final Hidden State (h_t):
h_t = (1 - z_t) * h_{t-1} + z_t * h̃_t
Mixes old hidden state & new candidate.
📊 GRU Architecture Diagram:
Simplified Visual:
Input x_t
↓
+--------------------+
| Update Gate | -----> z_t
+--------------------+
↓
+--------------------+
| Reset Gate | -----> r_t
+--------------------+
↓
+----------------------------------+
| Candidate Activation (h̃_t) |
| Combines reset gate & input |
+----------------------------------+
↓
+----------------------------------+
| Final Hidden State (h_t) |
| Combines old state & candidate |
+----------------------------------+
Detailed Visual Flow:
Previous Hidden State (h_{t-1}) ─────────┐
│
Input (x_t) ─────────────┐ ▼
│ +-------------+
└────────▶ | Update Gate|──────▶ z_t
+-------------+
│
▼
+-------------+
| Reset Gate |─────▶ r_t
+-------------+
│
▼
┌──────────────────┐
│ Apply Reset Gate │
└──────────────────┘
│
▼
+-------------------+
| Candidate h̃_t |
+-------------------+
│
▼
+-------------------------------------+
| Combine with h_{t-1} via Update Gate|
+-------------------------------------+
│
▼
Final Hidden State (h_t)
🟢 GRU vs LSTM:
| Feature | GRU | LSTM |
|---|---|---|
| Gates | 2 (Update, Reset) | 3 (Input, Forget, Output) |
| Memory Cell | No separate cell state (uses hidden state) | Separate cell state and hidden state |
| Parameters | Fewer | More (heavier) |
| Computation Speed | Faster | Slightly slower |
| Performance | Similar (depends on dataset/task) | Sometimes better for very long sequences |
🌟 Key Benefits of GRU:
- Simpler architecture, fewer parameters.
- Efficient for training, faster convergence.
- Good balance between speed & performance.
🚀 Applications of GRU:
- NLP (Language Modeling, Translation)
- Speech Recognition
- Time Series Forecasting
- Stock Market Prediction
- Video Data Analysis
Comments
Post a Comment