And there's a fact here that's very hard to dispute, this method works. I can give a computer instructions and it "understands" them in a way that wasn't possible before LLMs. The main debate now is over the semantics of words like "understanding" and whether or not an LLM is conscious in the same way as a human being (it isn't).
I'm surprised that he doesn't mention "universal grammar" once in that essay. Maybe it so happens that humans do have some innate "universal grammar" wired in by instinct but it's clearly not _necessary_ to be able to parse things. You don't need to set up some explicit language rules or generative structure, enough data and the model learns to produce it. I wonder if anyone has gone back and tried to see if you can extract out some explicit generative rules from the learned representation though.
Since the "universal grammar" hypothesis isn't really falsifiable, at best you can hope for some generalized equivalent that's isomorphic to the platonic representation hypothesis and claim that all human language is aligned in some given latent representation, and that our brains have been optimized to be able to work in this subspace. That's at least a testable assumption, by trying to reverse engineer the geometry of the space LLMs have learned.
(I'm not that familiar with LLM/ML, but it seems like trained behavioral response rather than intelligent parsing. I believe this is part of why it hallucinates? It doesn't understand concepts, it just spits out words - perhaps a parrot is a better metaphor?)
And of course empirically LLMs do generate valid English sentences. They may not necessarily be _correct_ sentences in a propositional truth-value sense (as seen by so-called "hallucinations), but they are semantically "well-formed" in contrast to Chomsky's famous example of the failure of probabilistic grammar models, "Colorless green ideas sleep furiously."
I'm not a linguist but I don't think linguistics has ever cared about the truth value of a sentence, that's more under the realm of logic.
The compression we use in languages to not label impossible adjectives against impossible nouns (green ideas is impossible as ideas don't have colors, we could have a suffix on every noun to mark what can be colored and what cannot) is because we need to transfer these over the air, and quickly, before the lion jumps on the hunter. It's one of the many attributes of "languages in the wild" (Chinese doesn't use "tenses" really, can you imagine the compressive value?), and that's what Chomsky says here:
Proceeding further with normal science, we find that the internal processes and elements of the language cannot be detected by inspection of observed phenomena. Often these elements do not even appear in speech (or writing), though their effects, often subtle, can be detected. That is yet another reason why restriction to observed phenomena, as in LLM approaches, sharply limits understanding of the internal processes that are the core objects of inquiry into the nature of language, its acquisition and use. But that is not relevant if concern for science and understanding have been abandoned in favor of other goals.
Understand what he means: you can read a million text through a machine, it will never infer why we don't label adjective and nouns to prevent confusion and "green ideas". But for us it's painfully obvious, we don't have time when we speak to do all that. And I come from a language when we label every noun with a gender, I can see how stupid and painful it is to grasp for foreigners: it doesn't make any sense. Why do we do it ? Ask ChatGPT, will it tell you that it's because we like how beautiful it all sounds, which is the stupid reason why we do that ?