Social media keep using this excuse for not trying. We can moderate spam in emails with a simple naive bayes classifier, why don't we just do that with comments? It could easily classify comments that are part of a bandwagon and flag them automaticly hiding them or for human review.
We are able to moderate email but the concepts we use to do so are never applied to comments, I don't know why, this seems like a solved problem.
In SMTP servers I've managed for clients we typically block anywhere from 80 to 99.999% (yes 10000 blocked to one success) messages. I'd call that MegaModeration if there was such a term.
And if you think email spam is solved then I don't believe you read HN often as there is a common complaint of "Gmail is blocking anything I send, I'm a low volume non-commercial sender"
In addition email filtering is extremely slow to react to new methods, generally taking hours depending on the reporting system.
Lastly, you've not thought about the problem much. How are you going to rapidly detect the difference between a fun meme that spreads virally versus an attack against an individual. Far more often you're going to be blocking something that's not a bad thing.
Spam filters are probably one of the single most consistently unreliable pieces of software I ever have to use; regardless of the email provider; or email client I use.
I have to check my junk folder like it’s my inbox.
On both Apple Mail and Outlook; with two different emails - email money transfers (EMTs) will get shoved in my junk box; despite the dozens of times I have marked said emails as not junk.
I’ll get spam emails, but I don’t get mail from newsletters I’ve actually signed up for.
Like…if you’re trying to use spam emails as an example of success; and even a model we should follow for…anything else; I’m going to laugh you out of the room and tell you to keep me the hell away from whatever tools you want to use with that technology.
Spam filtering software for email is at best useless; at its worst; mind numbing log frustrating. It’s a tool I’ll never trust.
I get that no machine learning is 100% perfect which is why it should be used as an indicator rather than the deciding factor.
I have had issues with gmail blocking emails but as you point out it was always because of ip reputation not over zealous Naive Bayes.
[1] https://demos.co.uk/press-release/staggering-scale-of-social...