fixed-length context vector bottleneck

Topics

encoder-decoder architecture

neural information retrieval

Traditional encoder-decoder architecture for seq2seq modeling compresses entire input sequence $X = (x_{1}, ..., x_{T_{x}})$ into single fixed-length context vector $c$ .

Encoder processes input sequence, produces final hidden state $h_{T_{x}}$ . This $h_{T_{x}}$ serves as context vector $c$ :
$c = h_{T_{x}} = Encoder (x_{1}, ..., x_{T_{x}})$
Decoder uses fixed vector $c$ as initial state or conditioning input at each time step to generate output $Y = (y_{1}, ..., y_{T_{y}})$ .

Decoder equations:

$s_{t} = f (s_{t - 1}, y_{t - 1}, c)$
$y_{t} = g (s_{t}, c)$

$s_{t}$ is decoder hidden state at time $t$ . $f$ is decoder RNN cell. $g$ is output layer.

Problems with fixed-length context vector:

Information Loss: Mapping variable-length input to fixed vector limits information retention. Capacity of $c$ limited, especially for long inputs $T_{x}$
Difficulty with Long Sequences: Performance degrades as input length $T_{x}$ grows. Fixed $c$ struggles to encode all relevant information for long output sequence generation. Decoder relies on single compressed representation for all output steps
Inability to Focus: Fixed context vector provides global input summary. Does not allow decoder to focus on specific input parts most relevant for current output generation. All input parts treated equally during compression

The attention mechanism addresses this bottleneck. Introduced in models like bahdanau attention, luong attention and later “widespread” via the transformer. Allows decoder to compute context vector $c_{t}$ at each decoding step $t$ .

$c_{t}$ is weighted sum of encoder hidden states $h_{1}, ..., h_{T_{x}}$ :

$c_{t} = \sum_{i = 1}^{T_{x}} α_{t i} h_{i}$

Weights $α_{t i}$ are dynamically computed based on relevance of each encoder state $h_{i}$ to current decoder hidden state $s_{t - 1}$ . This allows decoder to focus on most pertinent input information for each output token, mitigating information bottleneck.

Altamash Khan

Altamash Khan

fixed-length context vector bottleneck

Backlinks