Where I'm skeptical of LLM skepticism is that people use the term "stochastic parrot" disparagingly, as if they're not impressed. LLMs are stochastic parrots in the sense that they probabilistically guess sequences of things, but isn't it interesting how far that takes you already? I'd never have guessed. Fundamentally I question the intellectual honesty of anyone who pretends they're not surprised by this.
That's why I'm not too impressed even when he has changed his mind: he has admitted to individual mistakes, but not to the systemic issues which produced them, which makes for a safe bet that there will be more mistakes in the future.
Of course, as they learn, early in the training, the first functions they will model, to lower the error, will start being the probabilities of the next tokens, since this is the simplest function that works for the loss reduction. Then gradients agree in other directions, and the function that the LLM eventually learn is no longer related to probabilities, but to the meaning of the sentence and what it makes sense to say next.
It's not be chance that often the logits have a huge signal in just two or three tokens, even if the sentence, probabilistically speaking, could continue in much more potential ways.
But the point of my response was just that I find it an extremely surprising how well an idea as simple as "find patterns in sequences" actually works for the purpose of sounding human, and I'm suspicious of anyone who pretends this isn't incredible. Can we agree on this?
But enough data implies probabilities. Consider 2 sentences:
"For breakfast I had oats"
"For breakfast I had eggs"
Training on this data, how do you complete "For breakfast I had..."?
There is no best deterministic answer. The best answer is a 50/50 probability distribution over "oats" and "eggs"
(All things considered, you may be right to be suspicious of me.)
Which LLMs have shown you "strong summarization abilities"?
And on the latent space bit, it's also true for classical models, and the basic idea behind any pattern recognition or dimensionality reduction. That doesn't mean it's necessarily "getting the right idea."
Again, I don't want to "think of it as a probability." I'm saying what you're describing is a probability distribution. Do you have a citation for "probability to express correctly the sentence/idea" bit? Because just having a latent space is no implication of representing an idea.
> he has admitted to individual mistakes, but not to the systemic issues which produced them, which makes for a safe bet that there will be more mistakes in the future.
What surprises me is the assumption that there's more than "find patterns in sequences" to "sounding human" i.e. to emitting human-like communication patterns. What else could there be to it? It's a tautology.
>If the recent developments don't surprise you, I just chalk it up to lack of curiosity.
Recent developments don't surprise me in the least. I am, however, curious enough to be absolutely terrified by them. For one, behind the human-shaped communication sequences there could previously be assumed to be an actual human.