Zettels ProblemSets Resume Contact

Altamash Khan

Did the war forge the spear that remained? No. All it did was identify the spear that wouldn't break

Altamash Khan

Did the war forge the spear that remained? No. All it did was identify the spear that wouldn't break

Zettels ProblemSets Resume Contact

linear attention vs flash attention

Apr 23, 20251 min read

Topics

attention mechanism

optimization

Goal of linear attention (algorithm optimization) is different from implementation optimizations like flash attention.

FlashAttention

Hardware-aware algorithm optimizes standard attention mechanism
Avoids explicit $N \times N$ attention matrix materialization in High Bandwidth Memory (HBM)
Uses techniques like tiling, recomputation within faster on-chip Static Random-Access Memory (SRAM)
Effectively reduces memory footprint to $O (N)$ during execution
Does not change underlying computational complexity; still performs $O (N^{2})$ computations

Linear Attention

Aims to reduce both computational complexity and memory complexity to $O (N)$
Achieved by reformulating attention calculation itself
Relevant where even $O (N^{2})$ compute becomes prohibitive

Backlinks

No backlinks found

Created with Quartz v4.4.0 © 2025

Source