It's as if you have a variable in a deterministic programming language, only you have to replay the entire history of the program's computation and input to get the next state of the machine (program counter + memory + registers).
Producing a token for an LLM is analogous to a tick of the clock for a CPU. It's the crank handle that drives the process.
You're fixating on the pseudo-computation within a single token pass. This is very limited compared to actual hidden state retention and the introspection that would enable if we knew how to train it and do online learning already.
The "reasoning" hack would not be a realistic implementation choice if the models had hidden state and could ruminate on it without showing us output.