A Developer Accidentally Found CSAM in AI Data. Google Banned Him for It

>>markat+(OP)
This raises an interesting point. Do you need to train models using CSAM so that the model can self-enforce restrictions on CSAM? If so, I wonder what moral/ethical questions this brings up.

>>giantg+19
I know what porn looks like. I know what children look like. I do not need to be shown child porn in order to recognize it if I saw it. I don't think there's an ethical dilemma here; there is no need if LLMs have the capabilities we're told to expect.

>>boothb+Yl
AI doesn't know what either porn or children are. It finds correlations between aspects of inputs and the labels porn and children. Even if you did develop an advanced enough AI that could develop a good enough idea of what porn and children are, how would you ever verify that it is indeed capable of recognizing child porn without plugging in samples for it to flag?

>>jjk166+Co
So it is able to correlate an image as porn and also correlate an image as containing children. Seems like it should be able to apply an AND operation to this result and identify new images that are not part of the data set.

>>wang_l+Mu
No, it found elements in an image that it tends to find in images labelled porn in the training data. It finds elements in an image it tends to find in images labelled child in the training data. If the training data is not representative, then the statistical inference is meaningless. Images that are unlike any in the training set may not trigger either category if they are lacking the things the AI expects to find, which may be quite irrelevant to what humans care about.

zlacker