I've never seen this question quantified in a really compelling way, and while interesting, I'm not sure this PDF succeeds, at least not well-enough to silence dissent. I think AI maximalists will continue to think that the models are in fact getting less dim-witted, while the AI skeptics will continue to think these apparent gains are in fact entirely a biproduct of "increasing" "omniscience." The razor will have to be a lot sharper before people start moving between these groups.
But, anyway, it's still an important question to ask, because omniscient-yet-dim-witted models terminate at "superhumanly assistive" rather than "Artificial Superintelligence", which in turn economically means "another bite at the SaaS apple" instead of "phase shift in the economy." So I hope the authors will eventually succeed.
We keep assigning adjectives to this technology that anthropomorphize the neat tricks we've invented. There's nothing "omniscient" or "dim-witted" about these tools. They have no wit. They do not think or reason.
All Large "Reasoning" Models do is generate data that they use as context to generate the final answer. I.e. they do real-time tuning based on synthetic data.
This is a neat trick, but it doesn't solve the underlying problems that plague these models like hallucination. If the "reasoning" process contains garbage, gets stuck in loops, etc., the final answer will also be garbage. I've seen sessions where the model approximates the correct answer in the first "reasoning" step, but then sabotages it with senseless "But wait!" follow-up steps. The final answer ends up being a mangled mess of all the garbage it generated in the "reasoning" phase.
The only reason we keep anthropomorphizing these tools is because it makes us feel good. It's wishful thinking that markets well, gets investors buzzing, and grows the hype further. In reality, we're as close to artificial intelligence as we were a decade ago. What we do have are very good pattern matchers and probabilistic data generators that can leverage the enormous amount of compute we can throw at the problem. Which isn't to say that this can't be very useful, but ascribing human qualities to it only muddies the discussion.
Computers can't think and submarines can't swim.
Output orientation - Is the output is similar to what a human would create if they were to think.
Process orientation - Is the machine actually thinking, when we say its thinking.
I met someone who once drew a circuit diagram from memory. However, they didn’t draw it from inputs, operations, to outputs. They started drawing from the upper left corner, and continued drawing to the lower right, adding lines, triangles and rectangles as need be.
Rote learning can help you pass exams. At some point, it’s a meaningless difference between the utility of “knowing” how engineering works, and being able to apply methods and provide a result.
This is very much the confusion at play here, so both points are true.
1) These tools do not “Think”, in any way that counts as human thinking
2) the output is often the same as what a human thinking, would create.
IF you are concerned with only the product, then what’s the difference? If you care about the process, then this isn’t thought.
To put it in a different context. If you are a consumer, do you care if the output was hand crafted by an artisan, or do you just need something that works.
If you are a producer in competition with others, you care if your competition is selling Knock offs at a lower price.