I don't really see how this is different from "LLMs can't multiply 20 digit numbers"--which btw, most humans can't either. I tried it once (using pen and paper) and consistently made errors somewhere.
This argument is tired as it keeps getting repeated for any flaws seen in LLMs. And the other tired argument is: wait ! this is a sigmoid curve, and we have not seen the inflection point yet. If someone have me a penny for every comment saying these, I'd be rich by now.
Humans invented machines because they could not do certain things. All the way from simple machines in physics (Archimedes lever) to the modern computer.
If your disappointment is that the LLM didn't invent a computer to solve the problem, maybe you need to give it access to physical tools, robots, labs etc.
Sure, humans may fail doing a 20 digit multiplication problems but I don't think that's relevant. Most aligned, educated and well incentivized humans (such as the ones building and handling labs) will follow complex and probably ill-defined instructions correctly and predictably, instructions harder to follow and interpret than an exact Towers of Hanoi solving algorithm. Don't misinterpret me, human errors do happen in those contexts because, well, we're talking about humans, but not as catastrophically as the errors committed by LRMs in this paper.
I'm kind of tired of people comparing humans to machines in such simple and dishonest ways. Such thoughts pollute the AI field.
*In this case for some of the problems the LRMs were given an exact algorithm to follow, and they didn't. I wouldn't keep my hopes up for an LRM handling a full physical laboratory/factory.
If your argument is just that LRMs are more noisy and error prone in their reasoning, then I don't disagree.
> I'm kind of tired of people comparing humans to machines in such simple and dishonest ways.
The issue is people who say "see, the AI makes mistakes at very complex reasoning problems, so their 'thinking is an illusion'". That's the title of the paper.
This mistake comes not from people "comparing humans to machines", but from people fundamentally misunderstanding what thinking is. If thinking is what humans do, then errors are expected.
There is this armchair philosophical idea, that a human can simulate any turning machine and thus our reasoning is "maxomally general", and anything that can't do this is not general intelligence. But this is the complete opposite of reality. In our world, anything we know that can perfectly simulate a turning machine is not general intelligence, and vice versa.