For example, while testing it on "Founder Mode" there were a couple comments that mentioned something like "I hate founder mode but I really really like this other thing that is the opposite of founder mode..." and then just continues for a couple paragraphs. It classified the comment as positive. While _technically_ true, that wasn't quite the intention.
We think there are some ways around this that can increase the fidelity of these models that won't involve using generative AI. Like you said, doing it that way eats a ton of resources.
That's completely changed in the last 18 months. All my colleagues in the industry have switched to LLMs. They're seeing accuracy as good as hand coding was getting (these were generally college educated coders), at scale.
Non-LLM sentiment tools were always a bit of a parlor trick that required cherry picking to survive the demo. In almost every case drilling to the actual statements revealed it was wrong on anything involving irony, humor, or even complex grammar. That's changed almost overnight.
I think the finding that hn is "neutral" about MBAs says all that's needed about accuracy here.