zlacker

[return to "The Illusion of Thinking: Strengths and limitations of reasoning models [pdf]"]
1. bicepj+RF[view] [source] 2025-06-06 23:46:11
>>amrrs+(OP)
The study challenges the assumption that more “thinking” or longer reasoning traces necessarily lead to better problem-solving in LRMs
◧◩
2. bayind+oI[view] [source] 2025-06-07 00:14:00
>>bicepj+RF
As a test, I asked Gemini 2.5 Flash and Gemini 2.5 Pro to decode a single BASE64 string.

Flash answered correctly in ~2 seconds, at most. Pro answered very wrongly after thinking and elaborating for ~5 minutes.

Flash was also giving a wrong answer for the same string in the past, but it improved.

Prompt was the same: "Hey, can you decode $BASE64_string?"

I have no further comments.

◧◩◪
3. rafter+KX[view] [source] 2025-06-07 03:44:55
>>bayind+oI
well that's not a very convincing argument. That's just a failure to recognize when the use of a tool- base64 decoder- is needed, not a reasoning problem at all, right?
◧◩◪◨
4. bayind+Dj1[view] [source] 2025-06-07 10:02:40
>>rafter+KX
I don't know whether Flash uses a tool or not, but it answers pretty quickly. However, Pro opts to use its own reasoning, not a tool. When I look at the reasoning train, it pulls and pulls knowledge endlessly, refining that knowledge and drifting away.
[go to top]