The analogy I've used before is a bright first-grader named Johnny. Johnny stumbles across a high school algebra book. Unless Johnny's last name is von Neumann, he isn't going to get anything out of that book. An LLM will.
So much for the Chinese Room.
That analogy only holds if LLMs can solve novel problems that can be proven to not exist in any form in their training material.
So what is it really gonna do with a book, that LLM ? Reorder its internal matrix to be a little bit more precise when autocompleting sentences sounding like the book ? We could build an nvidia cluster the size of the Sun and it would repeat sentences back to us in unbelievable ways but would still be unable to take a knowledge-based decision, I fear.
So what are we in awe at exactly ? A pretty parrot.
The day the Chinese room metaphor disappears is when ChatGPT replies to you that your question is so boring it doesn't want to expend the resources to think about it. But it'd be ready to talk about this or that, that it's currently trying to get better at. When it finally has agency over its own intelligence. When it acquires a purpose.
Granted, for most language and programming tasks, you don’t need the latter, only the former.
- of course they reason
The claim of the “stochastic parrot” needs to go away
Eg see: https://www.anthropic.com/news/golden-gate-claude
I think the rub is that people think you need consciousness to do reasoning, I’m NOT claiming LLMs have consciousness or awareness
It may appear that they are solving novel problems but given the size of their training set they have probably seen them. There are very few questions a person can come up with that haven't already been asked and answered somewhere.
Similarly, they've ingested human-centuries or more of spelling bee related text, but can't reliably count the number of Rs in strawberry. (yes, I understand tokenization is to blame for a large part of this. perhaps that kind of limitation applies to other things too?)
If you debate with me, I'll keep reasoning on the same premises and usually the difference between two humans is not in reasoning but in choice of premises.
For instance you really want here to assert that LLM are close to human, I want to assert they're not - truth is probably in between but we chose two camps. We'll then reason from these premises, reach antagonistic conclusions and slowly try to attack each other point.
An LLM cannot do that, it cannot attack your point very well, it doesn't know how to say you're wrong, because it doesn't care anyway. It just completes your sentences, so if you say "now you're wrong, change your mind" it will, which sounds far from reasoning to me, and quite unreasonable in fact.
You can see this in riddles that are obviously in the training set, but older or lighter models still get them wrong. Or situations where the model gets them right, but uses a different method than the ones used in the training set.
It's famously easier to impress people with soft-sciences speculation than it is to impress the rules of math or compilers.
Sigh
That is absolute bullshit. Go try any frontier reasoning model such as Gemini 2.5 Pro or GPT-o3 and see how that goes. They will inform you that you are full of shit.
Do you understand that they are deep learning models with hundreds of layers and trillions of parameters? They have learned patterns of reasoning, and can emulate human reasoning well enough to call you out on that nonsense.
They are absolutely not. Despite the disingenuous name, computer neural nets are nothing like biological brains.
(Neural nets are a generalization of the logistic regression.)
People keep using "Chinese Room" to mean something it isn't and it's getting annoying. It is nothing more than a (flawed) intuition pump and should not be used as an analogy for anything, let alone LLMs. "It's a Chinese Room" is nonsensical unless there is literally an ACTUAL HUMAN in the setup somewhere - its argument, invalid as it is, is meaningless in its absence.
If I were to ask a Chinese room operator, "What would happen if gravity suddenly became half as strong while I'm drinking tea?," what would you expect as an answer?
Another question: if I were to ask "What would be an example of something a Chinese room's operator could not handle, that an actual Chinese human could?", what would you expect in response?
Claude gave me the first question in response to the second. That alone takes Chinese Rooms out of the realm of any discussion regarding LLMs, and vice versa. The thought experiment didn't prove anything when Searle came up with it, and it hasn't exactly aged well. Neither Searle nor Chomsky had any earthly idea that language was this powerful.
I tend to agree that Chinese Rooms should be kept out of LLM discussions. In addition to it being a flawed thought experiment, of all the dozens of times I've seen them brought up, not a single example has demonstrated understanding of what a Chinese Room is anyway.
So said Searle. But without specifying what he meant, it was a circular statement at best. Punting to "it passes a Turing Test" just turns it into a different debate about a different flawed test.
The operator has no idea what he's doing. He doesn't know Chinese. He has a Borges-scale library of Chinese books and a symbol-to-symbol translation guide. He can do nothing but manipulate symbols he doesn't understand. How anyone can pass a well-administered Turing test without state retention and context-based reflection, I don't know, but we've already put more thought into this than Searle did.