For example if you search for bitwarden it ranks three comments as negative, all others as neutral. If I as a human look at actual comments about bitwarden [1] there are lots of comments about people using it and recommending it. As a human I would rate the sentiment as very positive, with some "negative" comments in between (that are really about specific situations where it's the wrong tool).
I've had some success using LLMs for sentiment analysis. An LLM can understand context and determine that in the given context "Bitwarden is the answer" is a glowing recommendation, not a neutral statement. But doing sentiment analysis that way eats a lot of resources, so I can't fault this tool for going with the more established approach that is incapable of making that leap.
1: https://hn.algolia.com/?dateRange=pastMonth&page=0&prefix=tr...
For example, while testing it on "Founder Mode" there were a couple comments that mentioned something like "I hate founder mode but I really really like this other thing that is the opposite of founder mode..." and then just continues for a couple paragraphs. It classified the comment as positive. While _technically_ true, that wasn't quite the intention.
We think there are some ways around this that can increase the fidelity of these models that won't involve using generative AI. Like you said, doing it that way eats a ton of resources.
I don’t think it ever gained traction, probably because people aren’t interested in creating an actual theory of sentiment that matches the real world.
[1]: https://github.com/clips/pattern/wiki/pattern-en#sentiment
That's an interesting example because when I read it it sounds to me like something slightly positive, or at least, unlikely to be negative. Because if you had a negative opinion of Bitwarden, you probably wouldn't be storing stuff in it.
That's completely changed in the last 18 months. All my colleagues in the industry have switched to LLMs. They're seeing accuracy as good as hand coding was getting (these were generally college educated coders), at scale.
Non-LLM sentiment tools were always a bit of a parlor trick that required cherry picking to survive the demo. In almost every case drilling to the actual statements revealed it was wrong on anything involving irony, humor, or even complex grammar. That's changed almost overnight.
I think the finding that hn is "neutral" about MBAs says all that's needed about accuracy here.