Topics
Concept of local attention shares similarities with convolutional neural networks (CNNs), as both emphasize the processing of local context.
Stacking local attention layers expands the receptive field, analogous to how stacking convolutional layers increases the receptive field in CNNs. This similarity makes local attention particularly well-suited for data modalities with strong locality, such as images or time series, where nearby elements are often highly correlated.
However, a key difference lies in the application of weights: CNNs apply static, learned convolutional kernels across the input, whereas local attention dynamically computes attention weights based on the interaction between the query and keys within the local window (the key-query-value KQV framework). This dynamic weighting allows local attention to adapt its focus within the window based on the specific content.