But kudos to the effort and the idea of keeping news small is a most noble cause
Also, the second to last least significant article seems to be incorrectly categorized: "Regenerative medicine has come a long way, baby"
Which is actually a serious look back at the advancements over the last quarter century, hardly deserving the second to last position.
It seems like ChatGPT is ranking them not based on actual content significance but presumed significance of the headline. (Which would also make sense technically as ~1200 headlines is about the max context length of GPT-4).
I don't think it's fair, I think ChatGPT hallucinated that it's a tabloid.
Not sure how to fix this. I don't want to adjust sources credibility manually, that will introduce too much bias. My hope is that OpenAI will update ChatGPT with newer data and I could rerun the credibility evaluation.
So it's exceedingly unlikely the actual content, beyond the headline, is processed if your using the ChatGPT version.
In 99% cases a single news article fits within the context.
I drop those that don't fit, since several examples I saw were announcement of lottery numbers (too many tokens) and articles with broken html.
And the context length limit prevents that relation from extending to more then a few articles, if that's your method.
i.e. Your method doesn't actually produce a meaningful score that can be ranked in some linear order with the 1200 other articles.
At most it would make sense to rank a discrete score in relation to the few other articles it remembers.
Anything beyond that should be placed in 'score ranges' from 5 to 7 for example, not given a discrete score.
Sometimes I'm very frustrated about the news that get to the top. When I try to debug it, it gives me a completely different score.
I considered using ranges over discrete score, but dropped the idea, as it makes it too hard to find 1-5 articles that should make it to newsletter (there are 71 articles in this range right now) and it's hard to clearly display that idea in UI.
I guess my position right now is — it's not perfect, there are obvious errors (like the one you found above), and improvements are definitely possible.
But I hope that some people would find it "good enough" even with these inconsistencies. I also hope that ChatGPT or another LLM will make a big progress soon that would solve this problem automatically.