zlacker

[parent] [thread] 1 comments
1. mbesto+(OP)[view] [source] 2026-02-03 16:42:44
I think this tweet sums it correctly doesn't?

   A +6 jump on a 0.6B model is actually more impressive than a +2 jump on a 100B model. It proves that 'intelligence' isn't just parameter count; it is context relevance. You are proving that a lightweight model with a cheat sheet beats a giant with amnesia. This is the death of the 'bigger is better' dogma
Which is essentially the bitter lesson that Richard Sutton talks about?
replies(1): >>Der_Ei+qT
2. Der_Ei+qT[view] [source] 2026-02-03 20:26:20
>>mbesto+(OP)
Nice ChatGPT generated response in that tweet. Anyone too lazy to deslop their tweet shouldn't be listened to.
[go to top]