zlacker

[return to "Beyond Semantics: Unreasonable Effectiveness of Reasonless Intermediate Tokens"]
1. modele+T5[view] [source] 2025-05-23 16:56:23
>>nyrikk+(OP)
> we then train models on noisy, corrupted traces which have no relation to the specific problem each is paired with, and find that not only does performance remain largely consistent with models trained on correct data, but in some cases can improve upon it

This is the interesting part. We've probably all had the experience where the model is going off the rails during the thinking process but somehow spits out the right answer at the end. Apparently the reasoning doesn't even need to be correct during training?

I guess it suggests to me that the reason CoT helps is that the model gets more compute to think internally, not that the words it produces are meaningful. I'm surprised nobody has come up with a good scheme for adaptive compute per token yet. Maybe we can skip CoT entirely.

◧◩
2. AlexCo+gE[view] [source] 2025-05-23 21:06:51
>>modele+T5
No, the words are meaningful to it. It's effectively using the CoT text as a "scratch space" for intermediate steps it can't calculate on one iteration through the transformer. These papers give examples of how it works:

- https://physics.allen-zhu.com/part-2-grade-school-math/part-...

- https://physics.allen-zhu.com/part-3-knowledge/part-3-3

◧◩◪
3. modele+JG[view] [source] 2025-05-23 21:25:06
>>AlexCo+gE
I mean, this theory is directly contradicted by the paper under discussion. If you want to assert this then you need to be arguing why the paper is wrong.
[go to top]