Topics
Contrast to local attention, global attention mechanism allows each query token attend to every token in input sequence.
Mechanism w.r.t the key-query-value KQV framework:
- Query at position computes attention scores vs all keys for
- Context vector is weighted sum of all value vectors
- Examples: original bahdanau attention (used RNN context), transformer self-attention
Computational Complexity:
- Calculating attention scores operations
- Storing attention matrix memory
- Quadratic scaling makes it expensive for very long sequences
Context Capture:
- Strength lies in capturing long-range dependencies
- Provides comprehensive global understanding
- Every token attends every other token
- Beneficial for tasks needing holistic understanding, like sequence modeling
Note
Global attention tackled the fixed-length context vector bottleneck problem. Allowed model dynamically focus on relevant parts input sequence during translation or sequence generation. Led to improved performance, especially for longer sequences.