zlacker

[return to "Qwen3-Coder-Next"]
1. zamada+a5[view] [source] 2026-02-03 16:22:56
>>daniel+(OP)
Can anyone help me understand the "Number of Agent Turns" vs "SWE-Bench Pro (%)" figure? I.e. what does the spread of Qwen3-Coder-Next from ~50 to ~280 agent turns represent for a fixed score of 44.3%: that sometimes it takes that spread of agent turns to achieve said fixed score for the given model?
◧◩
2. yorwba+2g[view] [source] 2026-02-03 17:04:50
>>zamada+a5
SWE-Bench Pro consists of 1865 tasks. https://arxiv.org/abs/2509.16941 Qwen3-Coder-Next solved 44.3% (826 or 827) of these tasks. To solve a single task, it took between ≈50 and ≈280 agent turns, ≈150 on average. In other words, a single pass through the dataset took ≈280000 agent turns. Kimi-K2.5 solved ≈84 fewer tasks, but also only took about a third as many agent turns.
◧◩◪
3. zamada+KD[view] [source] 2026-02-03 18:36:11
>>yorwba+2g
Ah, a spread of the individual tests makes plenty of sense! Many thanks (same goes to the other comments).
[go to top]