>“Today ChatGPT read 1289 top news and gave 13 of them a significance score over 6/10.”
Is an excellent hook.
I wish there was more of a basis on the score it chose. For example:
>“Russia suffers 100,000 casualties in Ukraine conflict, US estimates.” is #2 and ranked 6.8
>“White House estimates Russia has suffered 100,000 casualties in Ukraine since December.” is #237 and ranked 3.8.
Initially I asked ChatGPT to estimate three things: event scale, event magnitude and event potential. That often resulted in clickbait articles going to the top.
To fix this I started to also ask it to estimate source credibility, so tabloids would get much lower score than, say New York Times.
Now you noticed another problem, similar articles get very different scores. I think ideally I could do some sort of deduplication, but I don't know how to implement it yet.
Any chance AI could be used to dedup the stories (like these are identical - only show higher source)?
The problem with deduping is that some news get posted and reposted by different sources for several days (sometimes even weeks) in a row. That's a huge context I'd have to put in AI.
1000 news titles * 3 days * 70 symbols per title = 210,000 symbols = 40000 words = 53000 tokens. My current context window is 8000 tokens and I think 32000 tokens is max that GPT-4 allows.
---
Add: now that I think about it should be possible to do in several runs. Will keep thinking about it, thanks for the suggestion.
100,000 dead Russians, is very likely the propaganda number. Maybes it’s right; maybe it’s not, IDK, but I know that there has been zero negative Ukraine news since the conflict started (ghost of Kiev anyone?), and the media machine definitely only works one way on this topic.
I mean, Russia isn’t the good guy here, but I don’t excuse propaganda just because it tells a story I want to hear.
In this case, I don’t care what the US or Ukraine says the number is. That isn’t news. It’s narrative, true or not.
So, I feel like the score should take into account if how likely an article is to be narrative vs purely an event.
The number is news because Russia is waging a large-scale invasion of another country using cannon fodder tactics. Reporting on that invasion is not simply propaganda.
Every time ChatGPT saw the words "World is ending" (not real example) it gave those articles very high score.
Estimating source credibility was the only solution I came up with.