zlacker

[return to "Beyond Semantics: Unreasonable Effectiveness of Reasonless Intermediate Tokens"]
1. valine+r7[view] [source] 2025-05-23 17:09:04
>>nyrikk+(OP)
I think it’s helpful to remember that language models are not producing tokens, they are producing a distribution of possible next tokens. Just because your sampler picks a sequence of tokens that contain incorrect reasoning doesn't mean a useful reasoning trace isn’t also contained within the latent space.

It’s a misconception that transformers reason in token space. Tokens don’t attend to other tokens. High dimensional latents attend to other high dimensional latents. The final layer of a decoder only transformer has full access to entire latent space of all previous latents, the same latents you can project into a distribution of next tokens.

◧◩
2. aiiizz+1E1[view] [source] 2025-05-24 11:46:44
>>valine+r7
Is that really true? E.g. anthropic said that the model can make decisions about all the tokens, before a single token is produced.
[go to top]