zlacker

[parent] [thread] 1 comments
1. AlexCo+(OP)[view] [source] 2025-05-23 21:06:51
No, the words are meaningful to it. It's effectively using the CoT text as a "scratch space" for intermediate steps it can't calculate on one iteration through the transformer. These papers give examples of how it works:

- https://physics.allen-zhu.com/part-2-grade-school-math/part-...

- https://physics.allen-zhu.com/part-3-knowledge/part-3-3

replies(1): >>modele+t2
2. modele+t2[view] [source] 2025-05-23 21:25:06
>>AlexCo+(OP)
I mean, this theory is directly contradicted by the paper under discussion. If you want to assert this then you need to be arguing why the paper is wrong.
[go to top]