Meanwhile, things can happen in the latent representation which aren't reflected in the intermediate outputs. You could, instead of using CoT, say "Write a recipe for a vegetarian chile, along with a lengthy biographical story relating to the recipe. Afterwards, I will ask you again about my original question." And the latents can still help model the primary problem, yielding a better answer than you would have gotten with the short input alone.
Along these lines, I believe there are chain of thought studies which find that the content of the intermediate outputs don't actually matter all that much...