“Computer” used to be a job, and human error rates are on the order of 1-2% no matter what level of training or experience they had. Work had to be done in triplicate and cross-checked if it mattered.
Digital computers are down to error rates roughly 10e-15 to 10e-22 and are hence treated as nearly infallible. We regularly write code routines where a trillion steps have to be executed flawlessly in sequence for things not to explode!
AIs can now output maybe 1K to 2K tokens in a sequence before they make a mistake. That’s 99.9% to 99.95%! Better than human already.
Don’t believe me?
Write me a 500 line program with pen and paper (not pencil!) and have it work the first time!
I’ve seen Gemini Pro 2.5 do this in a useful way.
As the error rates drop, the length of usefully correct sequences will get to 10K, then 100K, and maybe… who knows?
There was just a press release today about Gemini Diffusion that can alter already-generated tokens to correct mistakes.
Error rates will drop.
Useful output length will go up.
The issue seems to be more in the intelligence department. You can't really leave them in an agent-like loop with compiler/shell output and expect them to meaningfully progress on their tasks past some small number of steps.
Improving their initial error-free token length is solving the wrong problem. I would take less initial accuracy than a human but equally capable of iterating on their solution over time.
Programmers who "iterate" buggy shit for 10 rounds until they get it right are a post-Google push-update phenomenon.