zlacker

[return to "Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation"]
1. andes3+ai[view] [source] 2026-02-04 15:56:21
>>fheins+(OP)
Linear time attention doesn’t work, by principle. Dead end pursuit. Much great research on more efficient quadratic time inference
◧◩
2. smokel+mw[view] [source] 2026-02-04 16:57:14
>>andes3+ai
What about n log n?
[go to top]