LeCun, "Mathematical Obstacles on the Way to Human-Level AI"
Slide (Why autoregressive models suck)
I know there are other examples, and I'm not attacking your post; mainly it's a great opportunity to link this IMHO interesting article that interacts with many debates on HN.
> just one of the many tools of reason.
Read https://en.wikipedia.org/wiki/Preference_(economics)#Transit... then read https://pmc.ncbi.nlm.nih.gov/articles/PMC7058914/ and you will see there's a lot of data suggesting that indeed, it's just one of the many tools!
I think it's similar to how many dislike the non-deterministic output of LLM: when you use statistical tools, a non-deterministic output is a VERY nice feature to explore conceptual spaces with abductive reasoning: https://en.wikipedia.org/wiki/Abductive_reasoning
It's a tool I was using at a previous company, mixing LLMs, statistics and formal tools. I'm surprised there aren't more startups mixing LLM with z3 or even just prolog.
They do not reason significantly better than autoregressive LLMs. Which makes me question “one token at a time” as the bottleneck.
Also, Lecun has been pushing his JEPA idea for years now - with not much to show for it. With his resources one could hope we would see the benefits of that over the current state of the art models.
He wrote about Copycat, a program for understanding analogies ("abc is to 123 as cba is to ???"). The program worked at the symbolic level, in the sense that it hard-coded a network of relationships between words and characters. I wonder how close he was to "inventing" an LLM? The insight he needed was that instead of hard-coding patterns, he should have just trained on a vast set of patterns.
Hofstadter focused on Copycat because he saw pattern-matching as the core ability of intelligence. Unlocking that, in his view, would unlock AI. And, of course, pattern-matching is exactly what LLMs are good for.
I think he's right. Intelligence isn't about logic. In the early days of AI, people thought that a chess-playing computer would necessarily be intelligent, but that was clearly a dead-end. Logic is not the hard part. The hard part is pattern-matching.
In fact, pattern-matching is all there is: That's a bear, run away; I'm in a restaurant, I need to order; this is like a binary tree, I can solve it recursively.
I honestly can't come up with a situation that calls for intelligence that can't be solved by pattern-matching.
In my opinion, LeCun is moving the goal-posts. He's saying LLMs make mistakes and therefore they aren't intelligent and aren't useful. Obviously that's wrong: humans make mistakes and are usually considered both intelligent and useful.
I wonder if there is a necessary relationship between intelligence and mistakes. If you can solve a problem algorithmically (e.g., long-division) then there won't be mistakes, but you don't need intelligence (you just follow the algorithm). But if you need intelligence (because no algorithm exists) then there will always be mistakes.
More interesting is their research work. JEPA is what LeCun is betting on:
https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-jo...
“The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.”
― Edsger W. Dijkstra, in https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD867...
Sorry I am a little lost reading the last part about regressive next token and it is still wrong. Could someone explain a little bit? Edit: Explained here further down the thread. ( >>43594813 )
I personally went from AI skeptic ( it wont ever replace all human, at least not in the next 10 - 20 years ) to AI scary simply because of the reasoning capability it gained. It is not perfect, far from it but I can immediately infer how both algorithm improvements and hardware advance could bring us in 5 years. And that is not including any new breakthrough.
1. Weakest ever LLM? This one is really making me scratch my head. For a period of time Llama was considered to THE best. Furthermore, it's the third most used on OpenRouter (in the past month): https://openrouter.ai/rankings?view=month
2. Ignoring DeepSeek for a moment, Llama 2 and 3 require a special license from Meta if the products or services using the models have more than 700 million monthly active users. OpenAI, Claude and Gemini are not only closed source, but require a license/subscription to even get started.
Using n-gram/skip-gram model over the long text you can predict probabilities of word pairs and/or word triples (effectively collocations [1]) in the summary.
[1] https://en.wikipedia.org/wiki/Collocation
Then, by using (beam search and) an n-gram/skip-gram model of summaries, you can generate the text of a summary, guided by preference of the words pairs/triples predicted by the first step.
> he has admitted to individual mistakes, but not to the systemic issues which produced them, which makes for a safe bet that there will be more mistakes in the future.
(It's very common, esp. with educationally traumatized Americans, e.g., to identify Math with "calculation"/"approved tools" and not "the concepts")
"No amount of calculation will model conceptual thinking" <- sounds more reasonable?? (You said you were ok with nondeterministic outputs? :)
Sorry to come across as patronizing
[if we disregard that he said "concepts are key" -- though we can be yet more charitable and assume that he doesn't accept (median) human-level intelligence as the final boss]
Para-doxxing ">" Under-standing
(I haven't thought this through, just vibe-calculating, as it were, having pondered the necessity of concrete particulars for a split-second)(More on that "sophistiKated" aspect of "projeKtion": turns out not to be as idiosynKratic as I'd presumed, but I traded bandwidth for immediacy here, so I'll let GP explain why that's interesting, if he indeed finds it is :)
Wolfram (selfstyled heir to Leibniz/Galois) seems to be serving himself a fronthanded compliment:
https://writings.stephenwolfram.com/2020/12/combinators-a-ce...
>What I called a “projection” then is what we’d call a function now; a “filter” is what we’d now call an argument )