zlacker

[return to "Beyond Semantics: Unreasonable Effectiveness of Reasonless Intermediate Tokens"]
1. valine+r7[view] [source] 2025-05-23 17:09:04
>>nyrikk+(OP)
I think it’s helpful to remember that language models are not producing tokens, they are producing a distribution of possible next tokens. Just because your sampler picks a sequence of tokens that contain incorrect reasoning doesn't mean a useful reasoning trace isn’t also contained within the latent space.

It’s a misconception that transformers reason in token space. Tokens don’t attend to other tokens. High dimensional latents attend to other high dimensional latents. The final layer of a decoder only transformer has full access to entire latent space of all previous latents, the same latents you can project into a distribution of next tokens.

◧◩
2. jacob0+3k[view] [source] 2025-05-23 18:44:32
>>valine+r7
So you're saying that the reasoning trace represents sequential connections between the full distribution rather than the sampled tokens from that distribution?
[go to top]