Many people believe that "wants" come first, and are then followed by rationalizations. It's also a theory that's supported by medical imaging.
Maybe the LLM are a good emulation of system-2 (their perfomance sugggest it is), and what's missing is system-1, the "reptilian" brain, based on emotions like love, fear, aggression, (etc.).
For all we know, the system-1 could use the same embeddings, and just work in parallel and produce tokens that are used to guide the system-2.
Personally, I trust my "emotions" and "gut feelings": I believe they are things "not yet rationalized" by my system-2, coming straight from my system-1.
I know it's very unpopular among nerds, but it has worked well enough for me!
I know there are other examples, and I'm not attacking your post; mainly it's a great opportunity to link this IMHO interesting article that interacts with many debates on HN.
They do not reason significantly better than autoregressive LLMs. Which makes me question “one token at a time” as the bottleneck.
Also, Lecun has been pushing his JEPA idea for years now - with not much to show for it. With his resources one could hope we would see the benefits of that over the current state of the art models.