zlacker

[parent] [thread] 3 comments
1. tornat+(OP)[view] [source] 2023-05-31 23:41:47
It became a lot more possible this year to start doing AI content moderation. It won't be perfect, of course, but human mods are probably worse.

and yes, I've used content moderation AIs in the past (like Google's Perspective API) and they're not really usable. OpenAI moderation endpoint, embeddings classification, or even just gpt3.5-turbo would work marvelously.

replies(2): >>Michae+Ia >>joseph+bi
2. Michae+Ia[view] [source] 2023-06-01 01:26:51
>>tornat+(OP)
Have you tested this on a small scale?
replies(1): >>Implic+4s
3. joseph+bi[view] [source] 2023-06-01 02:54:52
>>tornat+(OP)
This is a great idea. The first version of this could be a human-assisted AI, where an ai makes moderation choices (with a confidence interval) and the choices it makes can be supervised and overriden by human moderators when it gets things wrong. Over time the AI can be retrained to make better choices. Kind of like a spam filter with more knobs.

The hard thing early on might be getting getting started with good training data. But chatgpt might already be good enough to make reasonable choices today with a good system prompt.

◧◩
4. Implic+4s[view] [source] [discussion] 2023-06-01 04:59:36
>>Michae+Ia
I have a reddit post database with ~6 million unique post titles and through various manual means I've identified ~100,000 of these post titles that _are_ spam.

First, I parsed the posts for the most common phrases at varying lengths, hand identifying 3,730 individual strings that I felt indicated spam within the post title, post body, reddit username, reddit user description or comment bodies.

These strings are then checked against new or updated records and things are flagged as spam as needed.

It's been weeks since I've had to manually intervene and identify more spam strings - that's not to say I won't need to eventually as trends and techniques change (or, as it happens - reddit's api changes), but this was a fantastically successful means for identifying and analyzing obvious spam.

Beyond the above, I used what was a relatively simple approach to identify similar post titles to those that were determined to be spam for a "if you thought that was spam then you'll probably think this is too..." type feature that was very effective.

If reddit's api changes weren't happening I'd have already started training an ML model/NN or whatever chatGPT told me was the best one to use in order to classify these objects from the existing data.

Ironically, all of this was in order to offer moderation bots to subreddits to help handle the spam problem.

I started with scraping the API to play with meilisearch as a search engine but was just awestruck at the amount of _obvious_ spam that was getting through automod/reddit's own spam filtering (if there is any?) before being published/available via the API. I just didn't want to store all the metadata I was generating for all the spam posts and couldn't depend on reddit to police the issues on their end.

Now they're still unable to get a handle on spam - but also cutting off the developers trying to help them.

[go to top]