zlacker

[return to "Qwen3-Coder-Next"]
1. zamada+a5[view] [source] 2026-02-03 16:22:56
>>daniel+(OP)
Can anyone help me understand the "Number of Agent Turns" vs "SWE-Bench Pro (%)" figure? I.e. what does the spread of Qwen3-Coder-Next from ~50 to ~280 agent turns represent for a fixed score of 44.3%: that sometimes it takes that spread of agent turns to achieve said fixed score for the given model?
◧◩
2. yorwba+2g[view] [source] 2026-02-03 17:04:50
>>zamada+a5
SWE-Bench Pro consists of 1865 tasks. https://arxiv.org/abs/2509.16941 Qwen3-Coder-Next solved 44.3% (826 or 827) of these tasks. To solve a single task, it took between ≈50 and ≈280 agent turns, ≈150 on average. In other words, a single pass through the dataset took ≈280000 agent turns. Kimi-K2.5 solved ≈84 fewer tasks, but also only took about a third as many agent turns.
◧◩◪
3. regula+6s[view] [source] 2026-02-03 17:53:49
>>yorwba+2g
If this is genuinely better than K2.5 even at a third the speed then my openrouter credits are going to go unused.
[go to top]