Beyond Semantics: Unreasonable Effectiveness of Reasonless Intermediate Tokens

>>nyrikk+(OP)
I think it’s helpful to remember that language models are not producing tokens, they are producing a distribution of possible next tokens. Just because your sampler picks a sequence of tokens that contain incorrect reasoning doesn't mean a useful reasoning trace isn’t also contained within the latent space.

It’s a misconception that transformers reason in token space. Tokens don’t attend to other tokens. High dimensional latents attend to other high dimensional latents. The final layer of a decoder only transformer has full access to entire latent space of all previous latents, the same latents you can project into a distribution of next tokens.

>>valine+r7
What the model is doing in latent space is auxilliary to anthropomorphic interpretations of the tokens, though. And if the latent reasoning matches a ground-truth procedure (A*), then we'd expect it to be projectable to semantic tokens, but it isn't. So it seems the model has learned an alternative method for solving these problems.

>>x_flyn+de1
You’re thinking about this like the final layer of the model is all that exists. It’s highly likely reasoning is happening at a lower layer, in a different latent space that can’t natively be projected into logits.

zlacker