The Illusion of Thinking: Strengths and limitations of reasoning models [pdf]

>>amrrs+(OP)
All the environments the test (Tower of Hanoi, Checkers Jumping, River Crossing, Block World) could easily be solved perfectly by any of the LLMs if the authors had allowed it to write code.

I don't really see how this is different from "LLMs can't multiply 20 digit numbers"--which btw, most humans can't either. I tried it once (using pen and paper) and consistently made errors somewhere.

>>thomas+kb1
> but humans cant do it either

This argument is tired as it keeps getting repeated for any flaws seen in LLMs. And the other tired argument is: wait ! this is a sigmoid curve, and we have not seen the inflection point yet. If someone have me a penny for every comment saying these, I'd be rich by now.

Humans invented machines because they could not do certain things. All the way from simple machines in physics (Archimedes lever) to the modern computer.

>>bwfan1+PJ1
> Humans invented machines because they could not do certain things.

If your disappointment is that the LLM didn't invent a computer to solve the problem, maybe you need to give it access to physical tools, robots, labs etc.

>>thomas+O22
Nah, even if we follow such a weak "argument" the fact is that, ironically, the evidence shown in this and other papers point towards the idea that even if LRMs did have access to physical tools, robots labs, etc*, they probably would not be able to harness them properly. So even if we had an API-first world (i.e. every object and subject in the world can be mediated via a MCP server), they wouldn't be able to perform as well as we hope.

Sure, humans may fail doing a 20 digit multiplication problems but I don't think that's relevant. Most aligned, educated and well incentivized humans (such as the ones building and handling labs) will follow complex and probably ill-defined instructions correctly and predictably, instructions harder to follow and interpret than an exact Towers of Hanoi solving algorithm. Don't misinterpret me, human errors do happen in those contexts because, well, we're talking about humans, but not as catastrophically as the errors committed by LRMs in this paper.

I'm kind of tired of people comparing humans to machines in such simple and dishonest ways. Such thoughts pollute the AI field.

*In this case for some of the problems the LRMs were given an exact algorithm to follow, and they didn't. I wouldn't keep my hopes up for an LRM handling a full physical laboratory/factory.

zlacker