zlacker

The score has to be in relation to other articles. Or else it's too random to have meaning. ChatGPT doesn't even given consistent scores from session-to-session for the same article.

And the context length limit prevents that relation from extending to more then a few articles, if that's your method.

i.e. Your method doesn't actually produce a meaningful score that can be ranked in some linear order with the 1200 other articles.

At most it would make sense to rank a discrete score in relation to the few other articles it remembers.

Anything beyond that should be placed in 'score ranges' from 5 to 7 for example, not given a discrete score.

replies(1): >>yakhin+2r

>>Michae+(OP)
You are spot on. I use temperature 0, but even with it, ChatGPT can be unpredictable.

Sometimes I'm very frustrated about the news that get to the top. When I try to debug it, it gives me a completely different score.

I considered using ranges over discrete score, but dropped the idea, as it makes it too hard to find 1-5 articles that should make it to newsletter (there are 71 articles in this range right now) and it's hard to clearly display that idea in UI.

I guess my position right now is — it's not perfect, there are obvious errors (like the one you found above), and improvements are definitely possible.

But I hope that some people would find it "good enough" even with these inconsistencies. I also hope that ChatGPT or another LLM will make a big progress soon that would solve this problem automatically.