zlacker

Okay, then let's see whether we are going to see real linear architectures, like Gated DeltaNet or Mamba-3, in some larger models. I don't believe there is a "lower bound" which states that those can never get to (or exceed) the real-world performance of quadratic attention. (Perfect recall in unrealistic needle-in-haystack tests doesn't count.)

replies(1): >>andy12+5R

>>cubefo+(OP)
I'm also sure that some kind of linear architecture is possible. After all, humans don't have N^2 perfect recall either.