zlacker

Why doubt? Transformers are a form of kernel smoothing [1]. It's literally interpolation [2]. That doesn't mean it can only echo the exact items in its training data - generating new data items is the entire point of interpolation - but it does mean it's "remixing" (literally forming a weighted sum of) those items and we would expect it to lose fidelity when moving outside the area covered by those points - i.e. where it attempts to extrapolate. And indeed we do see that, and for some reason we call it "hallucinating".

The subsequent argument that "LLMs only remix" => "all knowledge is a remix" seems absurd, and I'm surprised to have seen it now more than once here. Humanity didn't get from discovering fire to launching the JWST solely by remixing existing knowledge.

[1] http://bactra.org/notebooks/nn-attention-and-transformers.ht...

[2] Well, smoothing/estimation but the difference doesn't matter for my point.

replies(1): >>ramraj+tb

>>omnico+(OP)
Its not clear to me that LLMs hallucinating is because of they are extrapolating beyond their training data. Is that proven? Or are you extrapolating?

Even acknowledging it is interpolation, models can extrapolate slightly without making things up, within the range where the model still applies. Whos to say what this range is for an LLM operating in thousand dimensional space? As far as I can tell the main limiters to LLM creativity are guardrails we put in place for safety and usefulness.

And what exactly is your proof that human ingenuity is not just pattern matching. Im sure a hypothesis can be put that fire was discovered by just adding up all known facts the people of those times knew and stumbling on something that put it all together. Sounds like knowledge remix + slight extrapolating to me.

replies(1): >>omnico+yo

>>ramraj+tb
> Its not clear to me that LLMs hallucinating is because of they are extrapolating beyond their training data. Is that proven? Or are you extrapolating?

It's a hypothesis at this stage, but I'm going to have a go at making it more quantitative. It seems the obvious explanation for "hallucinations", and it seems like it should also be rather straightforward to attribute particular inference results to the training data that influenced them. I'm expecting to encounter difficulties, though, since the idea seems so obvious it's vanishingly unlikely it hasn't been tried.

> And what exactly is your proof that human ingenuity is not just pattern matching.

Firstly, I'm not the one making a strong claim that needs to "proved". Secondly, "pattern matching" is ill-defined and not what I'm saying human intelligence isn't. I'm saying human intelligence isn't a kernel smoothing algorithm run over a corpus of text. This seems rather obvious. What's your proof that it is that?