The Illusion of Thinking: Strengths and limitations of reasoning models [pdf]

>>amrrs+(OP)
This is easily explained by accepting that there is no such thing as LRMs. LRMs are just LLMs that iterate on its own answers more (or provides itself more context information of a certain type). The reasoning loop on an "LRM" will be equivalent to asking a regular LLM to "refine" its own response, or "consider" additional context of a certain type. There is no such thing as reasoning basically, as it was always a method to "fix" hallucinations or provide more context automatically, nothing else. These big companies baked in one of the hackiest prompt engineering tricks that your typical enthusiast figured out long ago and managed to brand it and profit off it. The craziest part about this was Deepseek was able to cause a multi billion dollar drop and pump of AI stocks with this one trick. Crazy times.

>>ivape+qw
Is that what "reasoning" means? That sounds pretty ridiculous.

I've thought before that AI is as "intelligent" as your smartphone is "smart," but I didn't think "reasoning" would be just another buzzword.

>>AlienR+jH
I am not too familiar with the latest hype, but "reasoning" has a very straightforward definition in my mind. For example, can the program in question derive new facts from old ones in a logically sound manner. Things like applying modus ponens. (A and A => B) => B. Or, all men are mortal and Socrates is a man, and therefore Socrates is mortal. If the program cannot deduce new facts, then it is not reasoning, at least not by my definition.

>>ngneer+ZW
When people say LLMs can't do X, I like to try it.

    Q: Complete 3 by generating new knowledge:
    1. today is warm
    2. cats likes warm temperatures
    3.

A: Therefore, a cat is likely to be enjoying the weather today.

Q: does the operation to create new knowledge you did have a specific name?

A: ... Deductive Reasoning

Q: does the operation also have a Latin name?

A: ... So, to be precise, you used a syllogismus (syllogism) that takes the form of Modus Ponens to make a deductio (deduction).

https://aistudio.google.com/app/prompts/1LbEGRnzTyk-2IDdn53t...

People then say "of course it could do that, it just pattern matched a Logic text book. I meant in a real example, not an artificially constructed one like this one. In a complex scenario LLMs obviously can't do Modus Ponens.

>>dist-e+og1
I do not know whether the state of the art is able to reason or not. The textbook example you gave is admittedly not very interesting. What you are hearing from people is that parroting is not reasoning, which is true.

I wonder if the state of the art can reason its way through the following:

"Adam can count to 14000. Can Adam count to 13500?"

The response needs to be affirmative for every X1 and X2 such that X2 <= X1. That is reasoning. Anything else is not reasoning.

The response when X2 > X1 is less interesting. But, as a human it might be "Maybe, if Adam has time" or "Likely, since counting up to any number uses the same algorithm" or "I don't know".

Seems ChatGPT can cope with this. Other examples are easy to come up with, too. There must be benchmarks for this.

Input to ChatGPT:

"Adam can lift 1000 pounds of steel. Can Adam lift 1000 pounds of feathers?"

Output from ChatGPT:

"1,000 pounds of feathers would be much easier for Adam to lift compared to 1,000 pounds of steel, because feathers are much lighter and less dense."

So, maybe not there yet...

zlacker