zlacker

That's hilarious. Does this imply LLMs inherited the human tendency to get attached to a perspective despite evidence to the contrary? I'll often try to coax the right answer out of GPT-3 when I know it's wrong, and it'll often insist that it's right several times in a row.

replies(3): >>OmarSh+x >>mkl+0a1 >>jibal+fb1

>>andai+(OP)
I think it does indeed suggest this, but I think this may be good news.

Part of what makes humans able to make progress in difficult, vague, and uncertain fields is a willingness to hold onto a point of view in the face of criticism to try & fix itl. This is, as a matter of fact, how science progresses, depending on if you ask scientists or historians of science. See Thomas Kuhn's Structure of Scientific Revolutions for more on this.

replies(1): >>jibal+Eb1

>>andai+(OP)
Getting attached to a perspective despite evidence to the contrary would require perspective and distinguishing fact from fiction, but just copying humans protesting that they're right (regardless of context) seems plausible, as there's a lot of that to learn from.

replies(1): >>kromem+7U4

>>andai+(OP)
Everything in the output of LLMs is inherited from human tendencies ... that's the very essence of how they work. But LLMs themselves don't have any of these tendencies ... they are just statistical engines that extract patterns from the training data.

replies(2): >>kromem+VT4 >>jibal+El5

>>OmarSh+x
But LLMs don't do these things ... they just produce text that statistically matches patterns in the training data. Since the humans who authored the training data have personality patterns, the outputs of LLMs show these personality patterns. But LLMs do not internalize such patterns--they have no cognitive functions of their own.

>>jibal+fb1
What you just said is paradoxical.

If there is a pattern in the training data that people resist contrary information to their earlier stated position, and a LLM extracts and extends patterns from the training data, then a LLM absolutely should have a tendency to resist contrary information to an earlier stated position.

The difference, and what I think you may have meant to indicate, is that there's not necessarily the same contributing processes that lend themselves to that tendency in humans occurring in parallel in the LLM, even if both should fall into that tendency in their output.

So the tendencies represented in the data are mirrored, such as "when people are mourning their grandmother dying I should be extra helpful" even if the underlying processes - such as mirror neurons firing to resonate grief or drawing on one's own lived experience of loss to empathize - are not occurring in the LLM.

>>mkl+0a1
> distinguishing fact from fiction

Actually this part does seem in recent research to be encoded in LLMs at an abstract level in a linear representation...

https://arxiv.org/abs/2310.06824

>>jibal+fb1
P.S. What I said is not "paradoxical". An LLM does not take on the attributes of its training data, any more than a computer screen displaying the pages of books becomes an author. Regardless of what is in the training data, the LLM continues to be the same statistical engine. The notion that an LLM can take on human characteristics is a category mistake, like thinking that there are people inside your TV set. The TV set is not, for instance, a criminal, even if it is tuned to crime shows 24/7. And an LLM does not have a tendency to protect its ego, even if everyone who contributed to the training data does ... the LLM doesn't have an ego. Those are characteristics of its output, not of the LLM itself, and there's a huge difference between the two. Too many people seem to think that, if for instance, they insult the LLM, it feels offended, just because it says it does. But that's entirely an illusion.