zlacker

[return to "Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation"]
1. thomas+Yc[view] [source] 2026-02-04 15:33:26
>>fheins+(OP)
There's a graveyard of 100s of papers with "approximate near linear time attention."

They always hope the speed increase makes up for the lower quality, but it never does. The quadratic time seems inherent to the problem.

Indeed, there are lower bounds showing that sub n^2 algorithms can't work: https://arxiv.org/pdf/2302.13214

◧◩
2. Whitne+sJ[view] [source] 2026-02-04 17:53:23
>>thomas+Yc
The 2023 paper even if true doesn’t preclude the 2026 paper from being true, it just sets constraints on how a faster attention solution would have to work.
[go to top]