A Developer Accidentally Found CSAM in AI Data. Google Banned Him for It

>>markat+(OP)
This raises an interesting point. Do you need to train models using CSAM so that the model can self-enforce restrictions on CSAM? If so, I wonder what moral/ethical questions this brings up.

>>giantg+19
I know what porn looks like. I know what children look like. I do not need to be shown child porn in order to recognize it if I saw it. I don't think there's an ethical dilemma here; there is no need if LLMs have the capabilities we're told to expect.

>>boothb+Yl
AI doesn't know what either porn or children are. It finds correlations between aspects of inputs and the labels porn and children. Even if you did develop an advanced enough AI that could develop a good enough idea of what porn and children are, how would you ever verify that it is indeed capable of recognizing child porn without plugging in samples for it to flag?

>>jjk166+Co
LLMs don't "know" anything. But as you say, they can identify correlations between content "porn" and a target image; between content labeled "children" and a target image. If a target image scores high in both, then it can flag child porn, all without being trained on CSAM.

>>boothb+3x
But things correlated with porn != porn and things correlated with children != children. For example, in our training set, no porn contains children, so the presence of children would mean it's not porn. Likewise all images of children are clothed, so no clothes means it's not a child. You know it's ridiculous because you know things, the AI does not.

Nevermind the importance of context, such as distinguishing a partially clothed child playing on a beach from a partially clothed child in a sexual situation.

zlacker