zlacker

SWE-Bench Pro consists of 1865 tasks. https://arxiv.org/abs/2509.16941 Qwen3-Coder-Next solved 44.3% (826 or 827) of these tasks. To solve a single task, it took between ≈50 and ≈280 agent turns, ≈150 on average. In other words, a single pass through the dataset took ≈280000 agent turns. Kimi-K2.5 solved ≈84 fewer tasks, but also only took about a third as many agent turns.

replies(2): >>regula+4c >>zamada+In

>>yorwba+(OP)
If this is genuinely better than K2.5 even at a third the speed then my openrouter credits are going to go unused.

>>yorwba+(OP)
Ah, a spread of the individual tests makes plenty of sense! Many thanks (same goes to the other comments).