5002 Recurrent Neural Network Memo
Difference with Neural Network
The output of RNN are passed to the next RNN network (internal variable).
Multilayer
Actually, these rnn can have multiple layers and multiple memory units.
Basic RNN
The basic rnn is simple. It use internal variable(s) as output variable(y). And the s can be calculated by using: $s_t = tanh(W [x_{t}, s_{t-1}] + b)$ $y_t = s_t$
LSTM
There are components:
- internal variable to store memory
- forget feature to forget some portion of internal variable
- input feature to decide portion of input and strength of input
- output feature to decide portion of output and strength of output
Some significant difference
First, the output and internal state previous are input, too.
Components and Framework
Component:
- Forget gate (portion of memory)
- Input gate (portion of input)
- Input activatin gate (weight of input)
- New internal state is Input(Input gate combine with Input activation gate + Forget gate combine with previous Internal state)
- Output gate (portion of output)
- Final output state gate (Multiplication of tanh internal state and Output gate) Formulas:
- Forget gate$f_t = \sigma(W_f [x_t, y_{t-1}] + b_f)$
- Input gate $I_t = \sigma(W_i (x_t, y_{t-1})+ b_i)$
- Input Activation gate $a_t = tahn(W_a[x_t, y_{t-1}]+b_a)$
- New internal state: $s_t = I_t \times a_t + s_{t-1}$
- Output gate: $O_t = \sigma(W_o [x_t, y_{t-1}] + b_o)$
- Final output: $O_t \times \tanh(s_t)$
Gated Recurrent Unit
Advantages
- training time shorter due to simple architecture
- few data point to capture properties
- no interval variable
Key difference
No internal varible here, just use previous prediction y.
Component
- Reset Gate: using previous prediction as reference to store memory. Portion of memory
- Input activation Gate: just input activation gate
- Output: Combine portion of predicted target and portion of the processed input variable. (ratio come from the update feature)
Summary
- reset component
- input activation component
- update component
- final output component
Formula
- reset gate: $r_t = \sigma(W_r [x_t, y_{t-1}] + b_r)$
- input activation gate: $a_t = \tanh(W_a [x_t, r_t \times y_{t-1}] + b_a)$
- update gate: $u_t = \sigma(W_u [x_t, y_{t-1}] + b_u)$
- final output: $y_t = (1-u_t) y_{t-1} + u_t a_t$