The Illusion of Thinking: Strengths and limitations of reasoning models [pdf]

>>amrrs+(OP)
The study challenges the assumption that more “thinking” or longer reasoning traces necessarily lead to better problem-solving in LRMs

>>bicepj+RF
As a test, I asked Gemini 2.5 Flash and Gemini 2.5 Pro to decode a single BASE64 string.

Flash answered correctly in ~2 seconds, at most. Pro answered very wrongly after thinking and elaborating for ~5 minutes.

Flash was also giving a wrong answer for the same string in the past, but it improved.

Prompt was the same: "Hey, can you decode $BASE64_string?"

I have no further comments.

>>bayind+oI
well that's not a very convincing argument. That's just a failure to recognize when the use of a tool- base64 decoder- is needed, not a reasoning problem at all, right?

>>rafter+KX
That's not really a cop out here: both models had access to the same tools.

Realistically there are many problems that non-reasoning models do better on, especially when the answer cannot be solved by a thought process: like recalling internal knowledge.

You can try to teach the model the concept of a problem where thinking will likely steer it away from the right answer, but at some point it becomes like the halting problem... how does the model reliably think its way into the realization a given problem is too complex to be thought out?

zlacker