Where I'm skeptical of LLM skepticism is that people use the term "stochastic parrot" disparagingly, as if they're not impressed. LLMs are stochastic parrots in the sense that they probabilistically guess sequences of things, but isn't it interesting how far that takes you already? I'd never have guessed. Fundamentally I question the intellectual honesty of anyone who pretends they're not surprised by this.
That's why I'm not too impressed even when he has changed his mind: he has admitted to individual mistakes, but not to the systemic issues which produced them, which makes for a safe bet that there will be more mistakes in the future.
Of course, as they learn, early in the training, the first functions they will model, to lower the error, will start being the probabilities of the next tokens, since this is the simplest function that works for the loss reduction. Then gradients agree in other directions, and the function that the LLM eventually learn is no longer related to probabilities, but to the meaning of the sentence and what it makes sense to say next.
It's not be chance that often the logits have a huge signal in just two or three tokens, even if the sentence, probabilistically speaking, could continue in much more potential ways.
So it's not so much about his incorrect predictions, but that these predictions were based on a core belief. And when the predictions turned out to be false, he didn't adjust his core beliefs, but just his predictions.
So it's natural to ask, if none of the predictions you derived from your core belief come true, maybe your core belief isn't true.
But the point of my response was just that I find it an extremely surprising how well an idea as simple as "find patterns in sequences" actually works for the purpose of sounding human, and I'm suspicious of anyone who pretends this isn't incredible. Can we agree on this?
But enough data implies probabilities. Consider 2 sentences:
"For breakfast I had oats"
"For breakfast I had eggs"
Training on this data, how do you complete "For breakfast I had..."?
There is no best deterministic answer. The best answer is a 50/50 probability distribution over "oats" and "eggs"
He's done a lot of amazing work, but his stance on LLMs seems continuously off the mark.
If your model of reality makes good predictions and mine makes bad ones, and I want a more accurate model of reality, then I really shouldn’t just make small provisional and incremental concessions gerrymandered around whatever the latest piece of evidence is. After a few repeated instances, I should probably just say “oops, looks like my model is wrong” and adopt yours.
This seems to be a chronic problem with AI skeptics of various sorts. They clearly tell us that their grand model indicates that such-and-such a quality is absolutely required for AI to achieve some particular thing. Then LLMs achieve that thing without having that quality. Then they say something vague about how maybe LLMs have that quality after all, somehow. (They are always shockingly incurious about explaining this part. You would think this would be important to them to understand, as they tend to call themselves “scientists”.)
They never take the step of admitting that maybe they’re completely wrong about intelligence, or that they’re completely wrong about LLMs.
Here’s one way of looking at it: if they had really changed their mind, then they would stop being consistently wrong.
-Lord Kelvin. 1895
> I think there is a world market for maybe five computers. Thomas Watson, IBM. 1943
> On talking films: “They’ll never last.” -Charlie Chaplin.
> This ‘telephone’ has too many shortcomings… -William Orton, Western Union. 1876
> Television won’t be able to hold any market -Darryl Zanuck, 20th Century Fox. 1946
> Louis Pasteur’s theory of germs is ridiculous fiction. -Pierre Pachet, French physiologist.
> Airplanes are interesting toys but of no military value. — Marshal Ferdinand Foch 1911
> There’s no chance the iPhone is going to get any significant market share. — Steve Ballmer, CEO Microsoft CEO. 2007
> Stocks have reached a permanently high plateau. — Irving Fisher, Economist. 1929
> Who the hell wants to hear actors talk? —Harry Warner, Warner Bros. 1927
> By 2005, it will become clear that the Internet’s impact on the economy has been no greater than the fax machine. -Paul Krugman, Economist. 1998
In many cases the folks in question were waaaaay past their best days.
(All things considered, you may be right to be suspicious of me.)
if the "core belief" is that the LLM architecture cannot be the way to AGI, that is more of an "educated bet", which does not get falsified when LLMs improve but still suggest their initial faults. If seeing that LLMs seem constrained in the "reactive system" as opposed to a sought "deliberative system" (or others would say "intuitive" vs "procedural" etc.) was an implicit part of the original "core belief", then it still stands in spite of other improvements.
Which LLMs have shown you "strong summarization abilities"?
Examples of people who could not see non (in some way) dead-ends do not cancel examples of people who correctly saw dead-ends. The lists may even overlap ("if it remains that way it's a dead-end").
And on the latent space bit, it's also true for classical models, and the basic idea behind any pattern recognition or dimensionality reduction. That doesn't mean it's necessarily "getting the right idea."
Again, I don't want to "think of it as a probability." I'm saying what you're describing is a probability distribution. Do you have a citation for "probability to express correctly the sentence/idea" bit? Because just having a latent space is no implication of representing an idea.
Rinse and repeat.
After a while you question whether LLMs are actually a dead end
As I said, it will depend on whether the examples in question were actually substantial part of the "core belief".
For example: "But can they perform procedures?" // "Look at that now" // "But can they do it structurally? Consistently? Reliably?" // "Look at that now" // "But is that reasoning integrated or external?" // "Look at that now" // "But is their reasoning fully procedurally vetted?" (etc.)
I.e.: is the "progress" (which would be the "anomaly" in scientific prediction) part of the "substance" or part of the "form"?
I'm not saying that they are being bad actors, just saying this is more probable in my mind than an LLM breakthrough.
> he has admitted to individual mistakes, but not to the systemic issues which produced them, which makes for a safe bet that there will be more mistakes in the future.
What surprises me is the assumption that there's more than "find patterns in sequences" to "sounding human" i.e. to emitting human-like communication patterns. What else could there be to it? It's a tautology.
>If the recent developments don't surprise you, I just chalk it up to lack of curiosity.
Recent developments don't surprise me in the least. I am, however, curious enough to be absolutely terrified by them. For one, behind the human-shaped communication sequences there could previously be assumed to be an actual human.