zlacker

[parent] [thread] 1 comments
1. cubefo+(OP)[view] [source] 2026-02-04 18:03:16
Okay, then let's see whether we are going to see real linear architectures, like Gated DeltaNet or Mamba-3, in some larger models. I don't believe there is a "lower bound" which states that those can never get to (or exceed) the real-world performance of quadratic attention. (Perfect recall in unrealistic needle-in-haystack tests doesn't count.)
replies(1): >>andy12+5R
2. andy12+5R[view] [source] 2026-02-04 22:07:47
>>cubefo+(OP)
I'm also sure that some kind of linear architecture is possible. After all, humans don't have N^2 perfect recall either.
[go to top]