They uploaded the full "widely-used" training dataset, which happened to include CSAM (child sexual abuse material).
While the title of the article is not great, your wording here implies that they purposefully uploaded some independent CSAM pictures, which is not accurate.
There is important additional context around it, of course, which mitigates (should remove) any criminal legal implications, and should also result in google unsuspending his account in a reasonable timeframe but what happened is also reasonable. Google does automated scans of all data uploaded to drive and caught CP images being uploaded (presumably via hashes from something like NCMEC?) and banned the user. Totally reasonable thing. Google should have an appeal process where a reasonable human can look at it and say "oh shit the guy just uploaded 100m AI training images and 7 of them were CP, he's not a pedo, unban him, ask him not to do it again and report this to someone."
The headline frames it like the story was "A developer found CP in AI training data from google and banned him in retaliation for reporting it." Totally disingenuous framing of the situation.
Indeed, which is why a comment that has infinitely more room to expand on the context should include that context when they are criticizing the title for being misleading.
Both the title and the comment I replied to are misleading. One because of the framing, the other because of the deliberate exclusion of extremely important context.
Imagine if someone accused you of "Uploading CSAM to Google Drive" without any other context. It's one of the most serious accusations possible! Adding like five extra words of context to make it clear that you are not a pedophile trafficking CSAM is not that much of an ask.
Make an LLM read the articles behind the links, and then rewrite the headlines (in a browser plugin for instance).
I bet the journalists and editors working for 404 will not correct their intentionally misleading headline. Why hold a random forum post buried in the middle of a large thread to a higher standard then the professionals writing headlines shown in 30-point font on the frontpage of their publication?
How many times do I need to repeat that I agree the headline is misleading? Yes, the article here has a shit title. You already made that point, I have already agreed to that point.
If I had an easy and direct line to the editor who came up with the title, I would point that out to them. Unfortunately they aren't on HN, that I'm aware, or I could also write a comment to them similar to yours.
Also, it could be optional. It probably should be, in fact.
The dataset had been online for six years. In my appeal I told Google exactly where the data came from — they ignored it. I was the one who reported it to C3P, and that’s why it finally came down. Even after Google flagged my Drive, the dataset stayed up for another two months.
So this idea that Google “did a good thing” and 404 somehow did something wrong is just absurd.
Google is abusing its monopoly in all kinds of ways, including quietly wiping out independent developers: https://medium.com/@russoatlarge_93541/déjà-vu-googles-using...