The actual paper [1] says that functional MRI (which is measuring which parts of the brain are active by sensing blood flow) indicates that different brain hardware is used for non-language and language functions. This has been suspected for years, but now there's an experimental result.
What this tells us for AI is that we need something else besides LLMs. It's not clear what that something else is. But, as the paper mentions, the low-end mammals and the corvids lack language but have some substantial problem-solving capability. That's seen down at squirrel and crow size, where the brains are tiny. So if someone figures out to do this, it will probably take less hardware than an LLM.
This is the next big piece we need for AI. No idea how to do this, but it's the right question to work on.
[1] https://www.nature.com/articles/s41586-024-07522-w.epdf?shar...
I’d be extremely surprised if AI recapitulates the same developmental path as humans did; evolution vs. next-token prediction on an existing corpus are completely different objective functions and loss landscapes.
I then looked it up and they had each copy/pasted the same Stack overflow answer.
Furthermore, the answer was extremely wrong, the language I used was superficially similar to the source material, but the programming concepts were entirely different.
What this tells me is there is clearly no “reasoning” happening whatsoever with either model, despite marketing claiming as such.
Not true. You yourself have failed at reasoning here.
The problem with your logic is that you failed to identify the instances where LLMs have succeeded with reasoning. So if LLMs both fail and succeed it just means that LLMs are capable of reasoning and capable of being utterly wrong.
It's almost cliche at this point. Tons of people see the LLM fail and ignore the successes then they openly claim from a couple anecdotal examples that LLMs can't reason period.
Like how is that even logical? You have contradictory evidence therefore the LLM must be capable of BOTH failing and succeeding in reason. That's the most logical answer.
Apple’s recent research summarized here [0] is worth a read. In short, they argue that what LLMs are doing is more akin to advanced pattern recognition than reasoning in the way we typically understand reasoning.
By way of analogy, memorizing mathematical facts and then correctly recalling these facts does not imply that the person actually understands how to arrive at the answer. This is why “show your work” is a critical aspect of proving competence in an education environment.
An LLM providing useful/correct results only proves that it’s good at surfacing relevant information based on a given prompt. That fact that it’s trivial to cause bad results by making minor but irrelevant changes to a prompt points to something other than a truly reasoned response, i.e. a reasoning machine would not get tripped up so easily.
It’s bloody obvious that when I classify success I mean that the llm is delivering a correct and unique answer for a novel prompt that doesn’t exist in the original training set. No need to go over the same tired analogies that have been regurgitated over and over again that you believe LLMs are reusing memorized answers. It’s a stale point of view. The overall argument has progressed further then that and we now need more complicated analysis of what’s going on with LLMs
Sources: https://typeset.io/papers/llmsense-harnessing-llms-for-high-...
https://typeset.io/papers/call-me-when-necessary-llms-can-ef...
And these two are just from a random google search.
I can find dozens and dozens of papers illustrating failures and successes of LLMs which further nails my original point. LLMs both succeed and fail at reasoning.
The main problem right now is that we don’t really understand how LLMs work internally. Everyone who claims they know LLMs can’t reason are just making huge leaps of irrational conclusions because not only does their conclusion contradict actual evidence but they don’t even know how LLMs work because nobody knows.
We only know how LLMs work at a high level and we only understand these things via the analogy of a best fit curve in a series of data points. Below this abstraction we don’t understand what’s going on.