Initially I asked ChatGPT to estimate three things: event scale, event magnitude and event potential. That often resulted in clickbait articles going to the top.
To fix this I started to also ask it to estimate source credibility, so tabloids would get much lower score than, say New York Times.
Now you noticed another problem, similar articles get very different scores. I think ideally I could do some sort of deduplication, but I don't know how to implement it yet.
Any chance AI could be used to dedup the stories (like these are identical - only show higher source)?
The problem with deduping is that some news get posted and reposted by different sources for several days (sometimes even weeks) in a row. That's a huge context I'd have to put in AI.
1000 news titles * 3 days * 70 symbols per title = 210,000 symbols = 40000 words = 53000 tokens. My current context window is 8000 tokens and I think 32000 tokens is max that GPT-4 allows.
---
Add: now that I think about it should be possible to do in several runs. Will keep thinking about it, thanks for the suggestion.
Every time ChatGPT saw the words "World is ending" (not real example) it gave those articles very high score.
Estimating source credibility was the only solution I came up with.