PG on trolls - zlacker

>>sharps+(OP)
I wonder if trolls can be categorized automatically. Caveats and all, trolls are characterized by their participation in negative-karma two-person conversations and down-voting of their comments by a diverse and changing set of users. A simple learning algorithm should zap the predictable ones. Trolls are nourished by attention and early detection and removal should nip that behavior in the bud.

>>Chaita+72
I've thought a lot about that. I wouldn't be surprised if current spam filters would work unchanged. There are not enough trolls on News.YC to make it worth investing time in such countermeasures, but it would be an interesting experiment to see if you could use statistical filtering techniques to detect trolls in some large public corpus like Digg or Reddit or Slashdot comment threads.

>>pg+a2
I don't think that would work; you would essentially require a filter that could actually understand the subject matter, in order to determine whether the purpose of the stated opinion was to bait people's responses. For example:

"I've thought a lot about that. I would be surprised if current spam filters could work unchanged. There are not enough trolls on News.YC to make it worth investing time in trying to write it in Lisp, but it would be an interesting experiment to see if you could use C++ programming techniques to detect trolls in some large public corpus like Digg or Reddit or Slashdot's comment threads."

>>thauma+f6
Not necessarily.

Spam filters routinely filter out emails based on a common set of words that are used without understanding the subject matter. For example, if an email contains the words viagara and store, the probability of that email being spam go up tremendously. Paul's "A plan for Spam" essay explains a lot more. And, from what I understand, Bayesian filtering is at the root of most spam blockers out there.

I'd be willing to wager that Bayesian filtering would be pretty powerful in filtering out trolls on social sites, as well. For example, if a post contains the words "asshole" and "fucktard" the probability of that post coming from a troll goes up exponentially.

If you trained a Bayesian filter, or spam algorithm well enough, it should be able to flag trollish posts fairly easily.