It’s a misconception that transformers reason in token space. Tokens don’t attend to other tokens. High dimensional latents attend to other high dimensional latents. The final layer of a decoder only transformer has full access to entire latent space of all previous latents, the same latents you can project into a distribution of next tokens.
There's a lot of stuff that makes this hard to discuss, ex. "projectable to semantic tokens" you mean "able to be written down"...right?
Something I do to make an idea really stretch its legs is reword it in Fat Tony, the Taleb character.
Setting that aside, why do we think this path finding can't be written down?
Is Claude/Gemini Plays Pokemon just an iterated A* search?