zlacker

If anything, this tool tracks with my general opinion on sentiment analysis: it would be awesome if it actually worked, but most algorithms just predict everything as neutral.

For example if you search for bitwarden it ranks three comments as negative, all others as neutral. If I as a human look at actual comments about bitwarden [1] there are lots of comments about people using it and recommending it. As a human I would rate the sentiment as very positive, with some "negative" comments in between (that are really about specific situations where it's the wrong tool).

I've had some success using LLMs for sentiment analysis. An LLM can understand context and determine that in the given context "Bitwarden is the answer" is a glowing recommendation, not a neutral statement. But doing sentiment analysis that way eats a lot of resources, so I can't fault this tool for going with the more established approach that is incapable of making that leap.

1: https://hn.algolia.com/?dateRange=pastMonth&page=0&prefix=tr...

replies(2): >>Mockap+E4 >>team-o+3f

>>wongar+(OP)
I think this is fair criticism for where it's at and mirrors my experience while building the tool. For generative AI at least, the smartest models + a good prompt will waffle stomp our tool in terms of quality.

For example, while testing it on "Founder Mode" there were a couple comments that mentioned something like "I hate founder mode but I really really like this other thing that is the opposite of founder mode..." and then just continues for a couple paragraphs. It classified the comment as positive. While _technically_ true, that wasn't quite the intention.

We think there are some ways around this that can increase the fidelity of these models that won't involve using generative AI. Like you said, doing it that way eats a ton of resources.

replies(3): >>zzleep+N6 >>wongar+xh >>drc500+Ct5

>>Mockap+E4
BTW, which algo did you use to classify sentiment? bert or something related?

>>wongar+(OP)
I haven't looked in the specific classifications of this particular model, but what your comment shows is the importance (IMO) of having a "no sentiment" class when classifying sentiment. E.g. if someone says "John doe is an average guy", the sentiment to John is neutral. But if someone says "John doe is my uncle" there's no sentiment and it should be classified as that. Perhaps the classifier here already takes this into account, but just thought it was worth mentioning the importance of having this extra class, or a separate pre-filter classifier. In your example I also see many that could be filtered out. E.g. "I store them in Bitwarden not in dotfiles" doesn't contain negative/neutral/positive sentiment, or at least you're not able to tell from just this sentence. I appreciate it's a fine line between neutral and no sentiment though.

replies(2): >>stepha+yo >>datafl+KL1

>>Mockap+E4
Just spitballing, but maybe a good tradeoff is to use NLP to find good candidate comments that are likely to contain a sentiment, and then analysing a small number of them with a more expensive model (say a quantized 3B or 7B LLM with a good prompt). The quality over quantity approach

>>team-o+3f
There’s some old work [1] that conceptualized sentiment as an interplay between subjectivity and sentiment. The more subjective a statement, the more “range” sentiment gets. I think this is what you are getting at.

I don’t think it ever gained traction, probably because people aren’t interested in creating an actual theory of sentiment that matches the real world.

[1]: https://github.com/clips/pattern/wiki/pattern-en#sentiment

>>team-o+3f
> E.g. "I store them in Bitwarden not in dotfiles" doesn't contain negative/neutral/positive sentiment, or at least you're not able to tell from just this sentence.

That's an interesting example because when I read it it sounds to me like something slightly positive, or at least, unlikely to be negative. Because if you had a negative opinion of Bitwarden, you probably wouldn't be storing stuff in it.

>>Mockap+E4
Almost 10 years ago, I ran a media sentiment analytics product that worked on hand coding. Everything automated that we tested - and that our competitors tried to launch - output garbage. The status quo for media analysis was either averaging a lot of garbage, or paying an insane amount for a very limited scope. We used automated tools to multiply the hand coders, but there was no plausible lights-out solution if accuracy really mattered.

That's completely changed in the last 18 months. All my colleagues in the industry have switched to LLMs. They're seeing accuracy as good as hand coding was getting (these were generally college educated coders), at scale.

Non-LLM sentiment tools were always a bit of a parlor trick that required cherry picking to survive the demo. In almost every case drilling to the actual statements revealed it was wrong on anything involving irony, humor, or even complex grammar. That's changed almost overnight.

I think the finding that hn is "neutral" about MBAs says all that's needed about accuracy here.