In a stage interview (a bit after the "sparks of agi in gpt4" paper came out) he made 3 statemets:
a) llms can't do math. They can trick us with poems and subjective prose, but at objective math they fail.
b) they can't plan
c) by the nature of their autoregressive architecture, errors compound. so a wrong token will make their output irreversibly wrong, and spiral out of control.
I think we can safely say that all of these turned out to be wrong. It's very possible that he meant something more abstract, and technical at its core, but in the real life all of these things were overcome. So, not a luddite, but also not a seer.
The harnesses have helped in training the models themselves (i.e. every good trace was "baked in" the model) and have improved in enabling test time compute. But at the end of the day this is all put back into the models, and they become better.
The simplest proof of this is on benchmarks like terminalbench and swe-bench with simple agents. The current top models are much better than their previous versions, when put in a loop with just a "bash tool". There's a ~100LoC harness called mini-swe-agent [1] that does just that.
So current models + minimal loop >> previous gen models with human written harnesses + lots of glue.
> Gemini 3 Pro reaches 74% on SWE-bench verified with mini-swe-agent!
This is orthogonal to the issue of whether all ideas are essentially "remixes." For the record I agree that they are.
They can and are improved (papered over) over time. For example by improving and tweaking the training data. Adding in new data sets is the usual fix. A prime example 'count the number of R's in Strawberry' caused quite a debacle at a time where LLM's were meant to be intelligent. Because they aren't they can trip up over simple problems like this. Continue to use an army of people to train them and these edge cases may become smaller over time. Fundamentally the LLM tech hasn't changed.
I am not saying that LLM's aren't amazing, they absolutely are. But WHAT they are is an understood thing so lets not confuse ourselves.