zlacker

[parent] [thread] 0 comments
1. storus+(OP)[view] [source] 2025-12-06 23:14:54
Linear attention is really bad, it's only good for benchmaxing but it leads to a loss of valuable granularity, which can be felt in the latest DeepSeek randomly forgetting/ignoring/correcting explicitly stated facts in the prompt.
[go to top]