zlacker

[parent] [thread] 12 comments
1. jsnell+(OP)[view] [source] 2025-12-11 16:32:00
As a small point of order, they did not get banned for "finding CSAM" like the outrage- and clickbait title claims. They got banned for uploading a data set containing child porn to Google Drive. They did not find it themselves, and them later reporting the data set to an appropriate organization is not why they got banned.
replies(3): >>jeffbe+K >>jfindp+E3 >>markat+Fo
2. jeffbe+K[view] [source] 2025-12-11 16:34:25
>>jsnell+(OP)
Literally every headline that 404 media has published about subjects I understand first-hand has been false.
replies(1): >>ameliu+9b
3. jfindp+E3[view] [source] 2025-12-11 16:47:51
>>jsnell+(OP)
>They got banned for uploading child porn to Google Drive

They uploaded the full "widely-used" training dataset, which happened to include CSAM (child sexual abuse material).

While the title of the article is not great, your wording here implies that they purposefully uploaded some independent CSAM pictures, which is not accurate.

replies(1): >>AdamJa+l8
◧◩
4. AdamJa+l8[view] [source] [discussion] 2025-12-11 17:07:55
>>jfindp+E3
No but "They got banned for uploading child porn to Google Drive" is a correct framing and "google banned a developer for finding child porn" is incorrect.

There is important additional context around it, of course, which mitigates (should remove) any criminal legal implications, and should also result in google unsuspending his account in a reasonable timeframe but what happened is also reasonable. Google does automated scans of all data uploaded to drive and caught CP images being uploaded (presumably via hashes from something like NCMEC?) and banned the user. Totally reasonable thing. Google should have an appeal process where a reasonable human can look at it and say "oh shit the guy just uploaded 100m AI training images and 7 of them were CP, he's not a pedo, unban him, ask him not to do it again and report this to someone."

The headline frames it like the story was "A developer found CP in AI training data from google and banned him in retaliation for reporting it." Totally disingenuous framing of the situation.

replies(1): >>jfindp+r9
◧◩◪
5. jfindp+r9[view] [source] [discussion] 2025-12-11 17:13:21
>>AdamJa+l8
"There is important additional context around it, of course,"

Indeed, which is why a comment that has infinitely more room to expand on the context should include that context when they are criticizing the title for being misleading.

Both the title and the comment I replied to are misleading. One because of the framing, the other because of the deliberate exclusion of extremely important context.

Imagine if someone accused you of "Uploading CSAM to Google Drive" without any other context. It's one of the most serious accusations possible! Adding like five extra words of context to make it clear that you are not a pedophile trafficking CSAM is not that much of an ask.

replies(1): >>jsnell+wj
◧◩
6. ameliu+9b[view] [source] [discussion] 2025-12-11 17:20:21
>>jeffbe+K
Can we use AI to fix this?

Make an LLM read the articles behind the links, and then rewrite the headlines (in a browser plugin for instance).

replies(1): >>add-su+oe
◧◩◪
7. add-su+oe[view] [source] [discussion] 2025-12-11 17:34:14
>>ameliu+9b
HN already needlessly rewrites headlines with automation and it's more annoying to see automation go stupidly wrong than letting the original imperfect situation stand. Having outrage about headlines is a choice.
replies(1): >>ameliu+pl
◧◩◪◨
8. jsnell+wj[view] [source] [discussion] 2025-12-11 17:59:01
>>jfindp+r9
Fair enough. I'd already included the fact about it being a data set in the post once, which seemed clear enough especially when my actual point was that the author did not "find" the CSAM, and by implication were not aware of it. But I have edited the message and added a repetition of it.

I bet the journalists and editors working for 404 will not correct their intentionally misleading headline. Why hold a random forum post buried in the middle of a large thread to a higher standard then the professionals writing headlines shown in 30-point font on the frontpage of their publication?

replies(1): >>jfindp+0k
◧◩◪◨⬒
9. jfindp+0k[view] [source] [discussion] 2025-12-11 18:01:00
>>jsnell+wj
>Why hold a random forum post buried in the middle of a large thread to a higher standard then the professionals writing headlines shown in 30-point font on the frontpage of their publication?

How many times do I need to repeat that I agree the headline is misleading? Yes, the article here has a shit title. You already made that point, I have already agreed to that point.

If I had an easy and direct line to the editor who came up with the title, I would point that out to them. Unfortunately they aren't on HN, that I'm aware, or I could also write a comment to them similar to yours.

◧◩◪◨
10. ameliu+pl[view] [source] [discussion] 2025-12-11 18:08:27
>>add-su+oe
I don't think HN's rewrite algorithm uses modern LLM techniques.

Also, it could be optional. It probably should be, in fact.

replies(2): >>jeffbe+ms >>shakna+jL
11. markat+Fo[view] [source] 2025-12-11 18:26:38
>>jsnell+(OP)
I’m the person who got banned. And just to be clear: the only reason I have my account back is because 404 Media covered it. Nobody else would touch the story because it happened to a nobody. There are probably a lot of “nobodies” in this thread who might someday need a reporter like Emanuel Maiberg to actually step in. I’m grateful he did.

The dataset had been online for six years. In my appeal I told Google exactly where the data came from — they ignored it. I was the one who reported it to C3P, and that’s why it finally came down. Even after Google flagged my Drive, the dataset stayed up for another two months.

So this idea that Google “did a good thing” and 404 somehow did something wrong is just absurd.

Google is abusing its monopoly in all kinds of ways, including quietly wiping out independent developers: https://medium.com/@russoatlarge_93541/déjà-vu-googles-using...

◧◩◪◨⬒
12. jeffbe+ms[view] [source] [discussion] 2025-12-11 18:43:03
>>ameliu+pl
My browser integrates an LLM, so I asked it to restate the headline of this one, and it came up with "Developer Suspended by Google After Uploading AI Dataset Containing CSAM" which seems pretty even-handed. Of course, I would want to dial the snark to 11. Many hacker stories can be headlined "Developer discovers that C still sucks" etc.
◧◩◪◨⬒
13. shakna+jL[view] [source] [discussion] 2025-12-11 20:06:22
>>ameliu+pl
If you edit a title after posting, it will not be rewritten again until a human at Y Combinator comes across it.
[go to top]