zlacker

[return to "Qwen3-Coder-Next"]
1. skhame+l9[view] [source] 2026-02-03 16:38:51
>>daniel+(OP)
It’s hard to elaborate just how wild this model might be if it performs as claimed. The claims are this can perform close to Sonnet 4.5 for assisted coding (SWE bench) while using only 3B active parameters. This is obscenely small for the claimed performance.
◧◩
2. cirrus+ab[view] [source] 2026-02-03 16:45:25
>>skhame+l9
If it sounds too good to be true…
◧◩◪
3. FuckBu+z01[view] [source] 2026-02-03 20:12:01
>>cirrus+ab
There have been advances recently (last year) in scaling deep rl by a significant amount, their announcement is in line with a timeline of running enough experiments to figure out how to leverage that in post training.

Importantly, this isn’t just throwing more data at the problem in an unstructured way, afaik companies are getting as many got histories as they can and doing something along the lines of, get an llm to checkpoint pull requests, features etc and convert those into plausible input prompts, then run deep rl with something which passes the acceptance criteria / tests as the reward signal.

[go to top]