zlacker

> We've probably all had the experience where the model is going off the rails during the thinking process but somehow spits out the right answer at the end. Apparently the reasoning doesn't even need to be correct during training?

How do we know if the reasoning was correct or not? Do we have more information about what the model was thinking besides just what it says it was thinking?

replies(2): >>rickyh+pc >>modele+hO1

>>trehal+(OP)
It's definitely not explicitly writing out everything it's "thinking" if you are considering all dimensions of the latent space that are connected, that can't really be exhibited with a sentence.

CoT builds on existing prompt engineering techniques by adding it to reinforcement learning to force the models to build their own CoT prompt essentially. So it's not what it's thinking but all indications are that it does guide the reasoning abilities of LLMs through the output distribution.

>>trehal+(OP)
During fine tuning the model does not produce reasoning traces, it consumes them. And the researchers presented it with traces deliberately constructed to be wrong except for the answer at the end. That's easy enough to do.