zlacker
[parent]
[thread]
0 comments
1. storus+(OP)
[view]
[source]
2025-12-06 23:14:54
Linear attention is really bad, it's only good for benchmaxing but it leads to a loss of valuable granularity, which can be felt in the latest DeepSeek randomly forgetting/ignoring/correcting explicitly stated facts in the prompt.
[go to top]