Chomsky on what ChatGPT is good for (2023)

>>mef+(OP)
The fact that we have figured out how to translate language into something a computer can "understand" should thrill linguists. Taking a word (token) and abstracting it's "meaning" as a 1,000-dimension vector seems like something that should revolutionize the field of linguistics. A whole new tool for analyzing and understanding the underlying patterns of all language!

And there's a fact here that's very hard to dispute, this method works. I can give a computer instructions and it "understands" them in a way that wasn't possible before LLMs. The main debate now is over the semantics of words like "understanding" and whether or not an LLM is conscious in the same way as a human being (it isn't).

>>caliba+cd
Restricted to linguistics, LLM's supposed lack of understanding should be a non-sequitur. If the question is whether LLMs have formed a coherent ability to parse human languages, the answer is obviously yes. In fact not just human languages, as seen with multimodality the same transformer architecture seems to work well to model and generate anything with inherent structure.

I'm surprised that he doesn't mention "universal grammar" once in that essay. Maybe it so happens that humans do have some innate "universal grammar" wired in by instinct but it's clearly not _necessary_ to be able to parse things. You don't need to set up some explicit language rules or generative structure, enough data and the model learns to produce it. I wonder if anyone has gone back and tried to see if you can extract out some explicit generative rules from the learned representation though.

Since the "universal grammar" hypothesis isn't really falsifiable, at best you can hope for some generalized equivalent that's isomorphic to the platonic representation hypothesis and claim that all human language is aligned in some given latent representation, and that our brains have been optimized to be able to work in this subspace. That's at least a testable assumption, by trying to reverse engineer the geometry of the space LLMs have learned.

>>kracke+AG
Can LLMs actually parse human languages? Or can they react to stimuli with a trained behavioral response? Dogs can learn to sit when you say "sit", and learn to roll over when you say "roll over". But the dog doesn't parse human language; it reacts to stimuli with a trained behavioral response.

(I'm not that familiar with LLM/ML, but it seems like trained behavioral response rather than intelligent parsing. I believe this is part of why it hallucinates? It doesn't understand concepts, it just spits out words - perhaps a parrot is a better metaphor?)

>>0xbadc+GS
You can train LLMs on the output very complex CFGs, and it successfully learns the grammar and hierarchy needed to complete any novel prefix. This is a task much more recursive and difficult than human languages, so there's no reason to believe that LLMs aren't able to parse human languages in the formal sense as well.

And of course empirically LLMs do generate valid English sentences. They may not necessarily be _correct_ sentences in a propositional truth-value sense (as seen by so-called "hallucinations), but they are semantically "well-formed" in contrast to Chomsky's famous example of the failure of probabilistic grammar models, "Colorless green ideas sleep furiously."

I'm not a linguist but I don't think linguistics has ever cared about the truth value of a sentence, that's more under the realm of logic.

>>kracke+wU
A “complex” cfg is still a cfg, and, giving credence to Chomsky’s hierarchy, remains computationally less complex than natural, context sensitive, grammars. Even a complex cfg can be parsed by a relatively simple program in ways that context-sensitive grammars cannot.

My understanding is that context sensitive grammars _can_ allow for recursive structures that are beyond cfgs, which is precisely why they sit below csgs in terms of computational complexity.

I don’t agree or disagree that LLMs might be, or are, capable of parsing (i.e., perception in Chomsky’s terms, or, arguably, “understanding” in any sense). But that they can learn the grammar of a “complex cfg” isn’t a convincing argument for the reasons you indicate.

>>agarre+ag1
I don't think it's clear that human languages are context sensitive. The only consistent claim I can find is that at one point someone examined Swiss German and found that it's weakly context sensitive. Also empirically human language don't have that much recursion. You can artificially construct such examples, but beyond a certain depth people won't be able to parse it either.

I don't know whether the non-existence of papers studying whether LLMs can model context-sensitive grammar is because they can't, or because people haven't tested that hypothesis yet. But again empirically LLMs do seem to be able to reproduce human language just fine. The whole "hallucination" argument is precisely that LLMs are very good at reproducing the structure of language even if those statements don't encode things with the correct truth value. The fact that they successfully learn to parse complex CFGs is thus evidence that they can actually learn underlying generative mechanisms instead of simply parroting snippets of training data as naively assumed, and it's not a huge leap to imagine that they've learned some underlying "grammar" for English as well.

So if one argues that LLMs as a generative model cannot generate novel valid sentences in the English language, then that is easily falsifiable hypothesis. If we had examples of LLMs producing non-well formed sentences, people would have latched onto that by now, instead of "count Rs in strawbery" but I've never seen anyone arguing as such.

>>kracke+Sg1
It’s uncontroversial now that the class of string languages roughly corresponding to “human languages” is mildly context sensitive in a particular sense. This debate was hashed out in the 80s and 90s.

I don’t think formal languages classes have much to tell us about the capabilities of LLMs in any case.

>Also empirically human language don't have that much recursion. You can artificially construct such examples, but beyond a certain depth people won't be able to parse it either.

If you limit recursion depth then everything is regular, so the Chomsky hierarchy is of little application.

zlacker