How do we know if the reasoning was correct or not? Do we have more information about what the model was thinking besides just what it says it was thinking?
CoT builds on existing prompt engineering techniques by adding it to reinforcement learning to force the models to build their own CoT prompt essentially. So it's not what it's thinking but all indications are that it does guide the reasoning abilities of LLMs through the output distribution.