zlacker

The fundamental idea that modern LLMs can only ever remix, even if its technically true (doubt), in my opinion only says to me that all knowledge is only ever a remix, perhaps even mathematically so. Anyone who still keeps implying these are statistical parrots or whatever is just going to regret these decisions in the future.

replies(5): >>heavys+z1 >>mrbung+a2 >>pseudo+T2 >>omnico+l01 >>theshr+Dx4

>>ramraj+(OP)
Yeah, Yann LeCun is just some luddite lol

replies(2): >>Nitpic+y5 >>Curiou+DR

>>ramraj+(OP)
> Anyone who still keeps implying these are statistical parrots or whatever is just going to regret these decisions in the future.

You know this is a false dichotomy right? You can treat and consider LLMs statistical parrots and at the same time take advantage of them.

replies(1): >>ramraj+cc1

>>ramraj+(OP)
But all of my great ideas are purely from my own original inspiration, and not learning or pattern matching. Nothing derivative or remixed. /sarcasm

>>heavys+z1
I don't think he's a luddite at all. He's brilliant in what he does, but he can also be wrong in his predictions (as are all humans from time to time). He did have 3 main predictions in ~23-24 that turned out to be wrong in hindsight. Debatable why they were wrong, but yeah.

In a stage interview (a bit after the "sparks of agi in gpt4" paper came out) he made 3 statemets:

a) llms can't do math. They can trick us with poems and subjective prose, but at objective math they fail.

b) they can't plan

c) by the nature of their autoregressive architecture, errors compound. so a wrong token will make their output irreversibly wrong, and spiral out of control.

I think we can safely say that all of these turned out to be wrong. It's very possible that he meant something more abstract, and technical at its core, but in the real life all of these things were overcome. So, not a luddite, but also not a seer.

replies(1): >>gjadi+36

>>Nitpic+y5
Have this shortcomings of llms been addressed by better models or by better integration with other tools? Like, are they better at coding because the models are truly better or because the agentic loops are better designed?

replies(2): >>Nitpic+J7 >>encycl+LR

>>gjadi+36
100% by better models. Since his talk models have gained more context windows (up to usable 1M), and RL (reinforcement learning) has been amazing at both picking out good traces, and taught the LLMs how to backtrack and overcome earlier wrong tokens. On top of that, RLAIF (RL with AI feedback) made earlier models better and RLVR (RL with verifiable rewards) has made them very good at both math and coding.

The harnesses have helped in training the models themselves (i.e. every good trace was "baked in" the model) and have improved in enabling test time compute. But at the end of the day this is all put back into the models, and they become better.

The simplest proof of this is on benchmarks like terminalbench and swe-bench with simple agents. The current top models are much better than their previous versions, when put in a loop with just a "bash tool". There's a ~100LoC harness called mini-swe-agent [1] that does just that.

So current models + minimal loop >> previous gen models with human written harnesses + lots of glue.

> Gemini 3 Pro reaches 74% on SWE-bench verified with mini-swe-agent!

[1] - https://github.com/SWE-agent/mini-swe-agent

>>heavys+z1
You don't understand Yann's argument. It's similar to Richard Sutton's, in that these things aren't thinking, they're emulating thinking, and the weak implicit world models that get built in the weights are insufficient for true "AGI."

This is orthogonal to the issue of whether all ideas are essentially "remixes." For the record I agree that they are.

replies(1): >>heavys+su3

>>gjadi+36
Fundamentally these shortcomings cannot be addressed.

They can and are improved (papered over) over time. For example by improving and tweaking the training data. Adding in new data sets is the usual fix. A prime example 'count the number of R's in Strawberry' caused quite a debacle at a time where LLM's were meant to be intelligent. Because they aren't they can trip up over simple problems like this. Continue to use an army of people to train them and these edge cases may become smaller over time. Fundamentally the LLM tech hasn't changed.

I am not saying that LLM's aren't amazing, they absolutely are. But WHAT they are is an understood thing so lets not confuse ourselves.

>>ramraj+(OP)
Why doubt? Transformers are a form of kernel smoothing [1]. It's literally interpolation [2]. That doesn't mean it can only echo the exact items in its training data - generating new data items is the entire point of interpolation - but it does mean it's "remixing" (literally forming a weighted sum of) those items and we would expect it to lose fidelity when moving outside the area covered by those points - i.e. where it attempts to extrapolate. And indeed we do see that, and for some reason we call it "hallucinating".

The subsequent argument that "LLMs only remix" => "all knowledge is a remix" seems absurd, and I'm surprised to have seen it now more than once here. Humanity didn't get from discovering fire to launching the JWST solely by remixing existing knowledge.

[1] http://bactra.org/notebooks/nn-attention-and-transformers.ht...

[2] Well, smoothing/estimation but the difference doesn't matter for my point.

replies(1): >>ramraj+Ob1

>>omnico+l01
Its not clear to me that LLMs hallucinating is because of they are extrapolating beyond their training data. Is that proven? Or are you extrapolating?

Even acknowledging it is interpolation, models can extrapolate slightly without making things up, within the range where the model still applies. Whos to say what this range is for an LLM operating in thousand dimensional space? As far as I can tell the main limiters to LLM creativity are guardrails we put in place for safety and usefulness.

And what exactly is your proof that human ingenuity is not just pattern matching. Im sure a hypothesis can be put that fire was discovered by just adding up all known facts the people of those times knew and stumbling on something that put it all together. Sounds like knowledge remix + slight extrapolating to me.

replies(1): >>omnico+To1

>>mrbung+a2
Yes, but the immediate equivalent scenario to me is how people treated other people as slaves merely using them like machines. Sure you got use out of them, but was that the best use?

>>ramraj+Ob1
> Its not clear to me that LLMs hallucinating is because of they are extrapolating beyond their training data. Is that proven? Or are you extrapolating?

It's a hypothesis at this stage, but I'm going to have a go at making it more quantitative. It seems the obvious explanation for "hallucinations", and it seems like it should also be rather straightforward to attribute particular inference results to the training data that influenced them. I'm expecting to encounter difficulties, though, since the idea seems so obvious it's vanishingly unlikely it hasn't been tried.

> And what exactly is your proof that human ingenuity is not just pattern matching.

Firstly, I'm not the one making a strong claim that needs to "proved". Secondly, "pattern matching" is ill-defined and not what I'm saying human intelligence isn't. I'm saying human intelligence isn't a kernel smoothing algorithm run over a corpus of text. This seems rather obvious. What's your proof that it is that?

>>Curiou+DR
I agree with Yann

>>ramraj+(OP)
There are musicians who "remix" (sample) other artists music and make massive hits themselves.

Not every solution needs to be unique, in many cases "remixing" existing solutions in an unique way is better and faster.