zlacker

[return to "LLMs cannot find reasoning errors, but can correct them"]
1. kromem+UE[view] [source] 2023-11-20 22:29:21
>>koie+(OP)
Stop doing self-correction within the context of the model's own generation.

The previous paper on self correction told the model "you previously said X - are there errors with this?"

This one has the mistakes statically added to the prompt in a task prompt and response without additional context immediately before asking if it has any errors.

Think about the training data.

How often does the training data of most of the Internet reflect users identifying issues with their own output?

How often does the training data reflect users identifying issues with someone else's output?

Try doing self-correction by setting up the context of "this was someone else's answer". It is still technically self-correction if a model is reviewing its own output in that context - it just isn't set up as "correct your own answer."

This may even be part of why the classifier did a better job at identifying issues - less the fine tuning and more the context (unfortunately I don't see the training/prompts for the classifier in their GitHub repo).

It really seems like the aversion to anthropomorphizing LLMs is leading people to ignore or overlook relevant patterns in the highly anthropomorphic training data fed into them. We might not want to entertain that a LLM has a concept of self vs other or a bias between critiques based on such a differentiation, and yet the training data almost certainly reflects such a concept and bias.

I'd strongly encourage future work on self-correction to explicitly define the thing being evaluated as the work of another. (Or ideally even compare self-correction rates between critiques in the context of their own output vs another's output.)

◧◩
2. andai+FG[view] [source] 2023-11-20 22:38:52
>>kromem+UE
That's hilarious. Does this imply LLMs inherited the human tendency to get attached to a perspective despite evidence to the contrary? I'll often try to coax the right answer out of GPT-3 when I know it's wrong, and it'll often insist that it's right several times in a row.
◧◩◪
3. jibal+UR1[view] [source] 2023-11-21 07:25:57
>>andai+FG
Everything in the output of LLMs is inherited from human tendencies ... that's the very essence of how they work. But LLMs themselves don't have any of these tendencies ... they are just statistical engines that extract patterns from the training data.
◧◩◪◨
4. jibal+j26[view] [source] 2023-11-22 08:21:01
>>jibal+UR1
P.S. What I said is not "paradoxical". An LLM does not take on the attributes of its training data, any more than a computer screen displaying the pages of books becomes an author. Regardless of what is in the training data, the LLM continues to be the same statistical engine. The notion that an LLM can take on human characteristics is a category mistake, like thinking that there are people inside your TV set. The TV set is not, for instance, a criminal, even if it is tuned to crime shows 24/7. And an LLM does not have a tendency to protect its ego, even if everyone who contributed to the training data does ... the LLM doesn't have an ego. Those are characteristics of its output, not of the LLM itself, and there's a huge difference between the two. Too many people seem to think that, if for instance, they insult the LLM, it feels offended, just because it says it does. But that's entirely an illusion.
[go to top]