Topics
The encoder takes variable length inputs and creates encoded representations aka hidden states for them. Mathematically, we can write a recurrence as:
which says that encoded representation at step is a function of the representation at and current input . In the general case, the encoder creates a single fixed-dimensional representation:
In many architectures, the encoded representation at last time-step is used as the final fixed-dimensional representation:
The decoder hidden state depends on the previous model output, the previous decoder hidden state, and the encoder output:
Normally, the decoder hidden state is passed on to some output operation such as linear layer + softmax, in order to get a probability distribution (e.g. for machine translation, we want probabilities over the vocab of target language):