encoder-decoder architecture formally defined

Topics

encoder-decoder architecture

The encoder takes variable length inputs and creates encoded representations aka hidden states for them. Mathematically, we can write a recurrence as:

h_{n} = f (x_{n}, h_{n - 1})

which says that encoded representation at step $n$ is a function of the representation at $n - 1$ and current input $x_{n}$ . In the general case, the encoder creates a single fixed-dimensional representation:

c = q (h_{1 : N})

In many architectures, the encoded representation at last time-step is used as the final fixed-dimensional representation:

c = h_{N}

The decoder hidden state $s_{t}$ depends on the previous model output, the previous decoder hidden state, and the encoder output:

s_{t} = g (y_{t - 1}, s_{t - 1}, c)

Normally, the decoder hidden state $s_{t}$ is passed on to some output operation such as linear layer + softmax, in order to get a probability distribution (e.g. for machine translation, we want probabilities over the vocab $V$ of target language):

\overset{y}{^}_{t} = softmax (s_{t} W + b) \in [0, 1]^{∣ V ∣}

Altamash Khan

Altamash Khan

encoder-decoder architecture formally defined

Backlinks

Altamash Khan

encoder-decoder architecture formally defined

Related

Backlinks