zlacker

[parent] [thread] 0 comments
1. sdento+(OP)[view] [source] 2025-07-07 14:41:36
CoT gives the model more time to think and process the inputs it has. To give an extreme example, suppose you are using next token prediction to answer 'Is P==NP?' The tiny number of input tokens means that there's a tiny amount of compute to dedicate to producing an answer. A scratchpad allows us to break free of the short-inputs problem.

Meanwhile, things can happen in the latent representation which aren't reflected in the intermediate outputs. You could, instead of using CoT, say "Write a recipe for a vegetarian chile, along with a lengthy biographical story relating to the recipe. Afterwards, I will ask you again about my original question." And the latents can still help model the primary problem, yielding a better answer than you would have gotten with the short input alone.

Along these lines, I believe there are chain of thought studies which find that the content of the intermediate outputs don't actually matter all that much...

[go to top]