zlacker

I think this is fair criticism for where it's at and mirrors my experience while building the tool. For generative AI at least, the smartest models + a good prompt will waffle stomp our tool in terms of quality.

For example, while testing it on "Founder Mode" there were a couple comments that mentioned something like "I hate founder mode but I really really like this other thing that is the opposite of founder mode..." and then just continues for a couple paragraphs. It classified the comment as positive. While _technically_ true, that wasn't quite the intention.

We think there are some ways around this that can increase the fidelity of these models that won't involve using generative AI. Like you said, doing it that way eats a ton of resources.

replies(3): >>zzleep+92 >>wongar+Tc >>drc500+Yo5

>>Mockap+(OP)
BTW, which algo did you use to classify sentiment? bert or something related?

>>Mockap+(OP)
Just spitballing, but maybe a good tradeoff is to use NLP to find good candidate comments that are likely to contain a sentiment, and then analysing a small number of them with a more expensive model (say a quantized 3B or 7B LLM with a good prompt). The quality over quantity approach

>>Mockap+(OP)
Almost 10 years ago, I ran a media sentiment analytics product that worked on hand coding. Everything automated that we tested - and that our competitors tried to launch - output garbage. The status quo for media analysis was either averaging a lot of garbage, or paying an insane amount for a very limited scope. We used automated tools to multiply the hand coders, but there was no plausible lights-out solution if accuracy really mattered.

That's completely changed in the last 18 months. All my colleagues in the industry have switched to LLMs. They're seeing accuracy as good as hand coding was getting (these were generally college educated coders), at scale.

Non-LLM sentiment tools were always a bit of a parlor trick that required cherry picking to survive the demo. In almost every case drilling to the actual statements revealed it was wrong on anything involving irony, humor, or even complex grammar. That's changed almost overnight.

I think the finding that hn is "neutral" about MBAs says all that's needed about accuracy here.