zlacker

> we then train models on noisy, corrupted traces which have no relation to the specific problem each is paired with, and find that not only does performance remain largely consistent with models trained on correct data, but in some cases can improve upon it

This is the interesting part. We've probably all had the experience where the model is going off the rails during the thinking process but somehow spits out the right answer at the end. Apparently the reasoning doesn't even need to be correct during training?

I guess it suggests to me that the reason CoT helps is that the model gets more compute to think internally, not that the words it produces are meaningful. I'm surprised nobody has come up with a good scheme for adaptive compute per token yet. Maybe we can skip CoT entirely.

replies(6): >>trehal+w2 >>kelsey+H2 >>istjoh+jc >>AlexCo+ny >>thomas+HH >>x_flyn+b81

>>modele+(OP)
> We've probably all had the experience where the model is going off the rails during the thinking process but somehow spits out the right answer at the end. Apparently the reasoning doesn't even need to be correct during training?

How do we know if the reasoning was correct or not? Do we have more information about what the model was thinking besides just what it says it was thinking?

replies(2): >>rickyh+Ve >>modele+NQ1

>>modele+(OP)
> I'm surprised nobody has come up with a good scheme for adaptive compute per token yet.

I have one, I just don't have the time or money to research it :(

replies(1): >>golol+69

>>kelsey+H2
Post it let's go.

>>modele+(OP)
Uh... hmmm... uhhh... ummm...

>>trehal+w2
It's definitely not explicitly writing out everything it's "thinking" if you are considering all dimensions of the latent space that are connected, that can't really be exhibited with a sentence.

CoT builds on existing prompt engineering techniques by adding it to reinforcement learning to force the models to build their own CoT prompt essentially. So it's not what it's thinking but all indications are that it does guide the reasoning abilities of LLMs through the output distribution.

>>modele+(OP)
No, the words are meaningful to it. It's effectively using the CoT text as a "scratch space" for intermediate steps it can't calculate on one iteration through the transformer. These papers give examples of how it works:

- https://physics.allen-zhu.com/part-2-grade-school-math/part-...

- https://physics.allen-zhu.com/part-3-knowledge/part-3-3

replies(1): >>modele+QA

>>AlexCo+ny
I mean, this theory is directly contradicted by the paper under discussion. If you want to assert this then you need to be arguing why the paper is wrong.

>>modele+(OP)
That sounds to me more like evidence that an LLM is never reasoning at all, even when it looks like it is.

The mock conversation that is written between think tags is not a conversation. It's the collection of tokens that are most likely to be written after a prompt to a model that was trained on example conversations.

Why is that different? In a real conversation, participants use logic to choose what is worth saying next. The next statement is already determined in the speaker's mind to be logically sound. In a mock conversation (the LLM's CoT), there is no logic. The next statement is only determined to be statistically familiar, then written immediately.

The end result of a desirable CoT interaction is text that would have been written by a thoughtful/logical conversationalist. Whether or not the mock conversation itself is logically consistent with the mock conclusion is irrelevant, because the LLM is only concerned with how familiar that mock conclusion is to the prompt, its mock conversation, and its training.

The overall vibe of how something is written behaves as a replacement for actual logic. Logical deduction is replaced with measures of confidence, conversations turns, etc. in writing style. It all works out in the end, because we are so consistent with the style in which we write real logical deductions, we have ended up providing an invisible semantics for the LLM to follow.

There is something meaningful that we are entirely blind to. Unfortunately, it doesn't follow rules the way logic does, so it's not a trustworthy replacement. Fortunately, it's useful for more general exploration.

>>modele+(OP)
I like to think of the intermediate tokens as low-dimensional hidden states. Also see the Coconut paper/citation

>>trehal+w2
During fine tuning the model does not produce reasoning traces, it consumes them. And the researchers presented it with traces deliberately constructed to be wrong except for the answer at the end. That's easy enough to do.