zlacker

Yann LeCun, Pioneer of AI, Thinks Today's LLM's Are Nearly Obsolete

submitted by alphad+(OP) on 2025-04-02 22:59:55 | 124 points 140 comments
[view article] [source] [go to bottom]

NOTE: showing posts with links only show all posts
2. gsf_em+4d8[view] [source] 2025-04-05 15:45:25
>>alphad+(OP)
Recent talk: https://www.youtube.com/watch?v=ETZfkkv6V7Y

LeCun, "Mathematical Obstacles on the Way to Human-Level AI"

Slide (Why autoregressive models suck)

https://xcancel.com/ravi_mohan/status/1906612309880930641

◧◩
7. sho_hn+2f8[view] [source] [discussion] 2025-04-05 16:03:54
>>csdvrx+Be8
Re the "medical imaging" reference, many of those are built on top of one famous study recording movement before conscious realization that isn't as clear-cut as it entered popular knowledge as: https://www.theatlantic.com/health/archive/2019/09/free-will...

I know there are other examples, and I'm not attacking your post; mainly it's a great opportunity to link this IMHO interesting article that interacts with many debates on HN.

◧◩
11. gessha+8g8[view] [source] [discussion] 2025-04-05 16:13:35
>>csdvrx+Be8
When I took cognitive science courses some years ago, one of the studies that we looked at was one where emotion-responsible parts of the brain were damaged. The result was reduction or complete failure to make decisions.

https://pmc.ncbi.nlm.nih.gov/articles/PMC3032808/

◧◩◪
13. csdvrx+og8[view] [source] [discussion] 2025-04-05 16:15:53
>>gibson+Xd8
Intransitive preferences is well known to experimental economists, but a hard pill to swallow for many, as it destroys a lot of algorithms (which depends on that) and require more robust tools like https://en.wikipedia.org/wiki/Paraconsistent_logic

> just one of the many tools of reason.

Read https://en.wikipedia.org/wiki/Preference_(economics)#Transit... then read https://pmc.ncbi.nlm.nih.gov/articles/PMC7058914/ and you will see there's a lot of data suggesting that indeed, it's just one of the many tools!

I think it's similar to how many dislike the non-deterministic output of LLM: when you use statistical tools, a non-deterministic output is a VERY nice feature to explore conceptual spaces with abductive reasoning: https://en.wikipedia.org/wiki/Abductive_reasoning

It's a tool I was using at a previous company, mixing LLMs, statistics and formal tools. I'm surprised there aren't more startups mixing LLM with z3 or even just prolog.

◧◩
14. kadush+wg8[view] [source] [discussion] 2025-04-05 16:16:54
>>csdvrx+Be8
There are LLMs which do not generate one token at a time: https://arxiv.org/abs/2502.09992

They do not reason significantly better than autoregressive LLMs. Which makes me question “one token at a time” as the bottleneck.

Also, Lecun has been pushing his JEPA idea for years now - with not much to show for it. With his resources one could hope we would see the benefits of that over the current state of the art models.

15. GMorom+8h8[view] [source] 2025-04-05 16:22:50
>>alphad+(OP)
I remember reading Douglas Hofstadter's Fluid Concepts and Creative Analogies [https://en.wikipedia.org/wiki/Fluid_Concepts_and_Creative_An...]

He wrote about Copycat, a program for understanding analogies ("abc is to 123 as cba is to ???"). The program worked at the symbolic level, in the sense that it hard-coded a network of relationships between words and characters. I wonder how close he was to "inventing" an LLM? The insight he needed was that instead of hard-coding patterns, he should have just trained on a vast set of patterns.

Hofstadter focused on Copycat because he saw pattern-matching as the core ability of intelligence. Unlocking that, in his view, would unlock AI. And, of course, pattern-matching is exactly what LLMs are good for.

I think he's right. Intelligence isn't about logic. In the early days of AI, people thought that a chess-playing computer would necessarily be intelligent, but that was clearly a dead-end. Logic is not the hard part. The hard part is pattern-matching.

In fact, pattern-matching is all there is: That's a bear, run away; I'm in a restaurant, I need to order; this is like a binary tree, I can solve it recursively.

I honestly can't come up with a situation that calls for intelligence that can't be solved by pattern-matching.

In my opinion, LeCun is moving the goal-posts. He's saying LLMs make mistakes and therefore they aren't intelligent and aren't useful. Obviously that's wrong: humans make mistakes and are usually considered both intelligent and useful.

I wonder if there is a necessary relationship between intelligence and mistakes. If you can solve a problem algorithmically (e.g., long-division) then there won't be mistakes, but you don't need intelligence (you just follow the algorithm). But if you need intelligence (because no algorithm exists) then there will always be mistakes.

◧◩◪◨
27. jagged+rj8[view] [source] [discussion] 2025-04-05 16:42:22
>>antire+Yi8
Here's a fun example of that kind of "I've updated my statements but not assessed any of my underlying lack of understanding" - it's a bad look on any kind of scientist.

https://x.com/AukeHoekstra/status/1507047932226375688

42. djoldm+Dl8[view] [source] 2025-04-05 16:58:53
>>alphad+(OP)
The idolatry and drama surrounding LeCun, Hinton, Schmidhuber, etc. is likely a distraction. This includes their various predictions.

More interesting is their research work. JEPA is what LeCun is betting on:

https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-jo...

◧◩◪
53. SpicyL+Wn8[view] [source] [discussion] 2025-04-05 17:18:50
>>George+7i8
Cancer eradication seems like a clear example of where highly effective pattern matching could be a game changer. That's where cancer research starts: pattern matching to sift through the incredibly large space of potential drugs and find the ones worth starting clinical trials for. If you could get an LLM to pattern-match whether a new compound is likely to work as a BTK inhibitor (https://en.wikipedia.org/wiki/Bruton%27s_tyrosine_kinase), or screen them for likely side effects before even starting synthesis, that would be a big deal.
◧◩◪
67. gwd+ms8[view] [source] [discussion] 2025-04-05 17:59:29
>>falcor+ek8
Or perhaps:

“The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.”

― Edsger W. Dijkstra, in https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD867...

◧◩
83. ksec+Xv8[view] [source] [discussion] 2025-04-05 18:25:12
>>antire+5i8
>Btw, other researchers that were in the LeCun side, changed side recently, saying that now "is different" because of CoT, that is the symbolic reasoning they were blabling before. But CoT is still regressive next token without any architectural change, so, no, they were wrong, too.

Sorry I am a little lost reading the last part about regressive next token and it is still wrong. Could someone explain a little bit? Edit: Explained here further down the thread. ( >>43594813 )

I personally went from AI skeptic ( it wont ever replace all human, at least not in the next 10 - 20 years ) to AI scary simply because of the reasoning capability it gained. It is not perfect, far from it but I can immediately infer how both algorithm improvements and hardware advance could bring us in 5 years. And that is not including any new breakthrough.

◧◩
85. mbesto+fw8[view] [source] [discussion] 2025-04-05 18:27:22
>>antire+5i8
I wanna believe everything you say (because you generally are a credible person) but a few things don't add up:

1. Weakest ever LLM? This one is really making me scratch my head. For a period of time Llama was considered to THE best. Furthermore, it's the third most used on OpenRouter (in the past month): https://openrouter.ai/rankings?view=month

2. Ignoring DeepSeek for a moment, Llama 2 and 3 require a special license from Meta if the products or services using the models have more than 700 million monthly active users. OpenAI, Claude and Gemini are not only closed source, but require a license/subscription to even get started.

◧◩
113. thesz+ZI8[view] [source] [discussion] 2025-04-05 19:50:52
>>antire+5i8
> there is no probabilistic link between the words of a text and the gist of the content

Using n-gram/skip-gram model over the long text you can predict probabilities of word pairs and/or word triples (effectively collocations [1]) in the summary.

[1] https://en.wikipedia.org/wiki/Collocation

Then, by using (beam search and) an n-gram/skip-gram model of summaries, you can generate the text of a summary, guided by preference of the words pairs/triples predicted by the first step.

◧◩◪◨⬒⬓
117. jagged+YO8[view] [source] [discussion] 2025-04-05 20:43:09
>>darkwa+fl8
Yann LeCun. I think this post is a more elegant summary of what I'm trying to illustrate with this kind of thing:

> he has admitted to individual mistakes, but not to the systemic issues which produced them, which makes for a safe bet that there will be more mistakes in the future.

>>43594771

◧◩◪◨
130. gsf_em+iu9[view] [source] [discussion] 2025-04-06 05:50:51
>>csdvrx+og8
Thanks for the links, the "tradeoff" aspect of paraconsistent logic is interesting. I think one way to achieve consensus with your debate partner might be to consider that the language rep is "just" a nondeterministic decompression of "the facts". I'm primed to agree with you but

>>41892090

(It's very common, esp. with educationally traumatized Americans, e.g., to identify Math with "calculation"/"approved tools" and not "the concepts")

"No amount of calculation will model conceptual thinking" <- sounds more reasonable?? (You said you were ok with nondeterministic outputs? :)

Sorry to come across as patronizing

◧◩◪◨⬒⬓⬔
139. gsf_em+8Fb[view] [source] [discussion] 2025-04-07 04:14:12
>>csdvrx+Wjb
Here is why I think Gibson could in principle still be right (without necessarily summoning religious feelings)

[if we disregard that he said "concepts are key" -- though we can be yet more charitable and assume that he doesn't accept (median) human-level intelligence as the final boss]

  Para-doxxing ">" Under-standing
(I haven't thought this through, just vibe-calculating, as it were, having pondered the necessity of concrete particulars for a split-second)

(More on that "sophistiKated" aspect of "projeKtion": turns out not to be as idiosynKratic as I'd presumed, but I traded bandwidth for immediacy here, so I'll let GP explain why that's interesting, if he indeed finds it is :)

Wolfram (selfstyled heir to Leibniz/Galois) seems to be serving himself a fronthanded compliment:

https://writings.stephenwolfram.com/2020/12/combinators-a-ce...

>What I called a “projection” then is what we’d call a function now; a “filter” is what we’d now call an argument )

[go to top]