The Illusion of Thinking: Strengths and limitations of reasoning models [pdf]

>>amrrs+(OP)
The study challenges the assumption that more “thinking” or longer reasoning traces necessarily lead to better problem-solving in LRMs

>>bicepj+RF
As a test, I asked Gemini 2.5 Flash and Gemini 2.5 Pro to decode a single BASE64 string.

Flash answered correctly in ~2 seconds, at most. Pro answered very wrongly after thinking and elaborating for ~5 minutes.

Flash was also giving a wrong answer for the same string in the past, but it improved.

Prompt was the same: "Hey, can you decode $BASE64_string?"

I have no further comments.

zlacker