zlacker

[parent] [thread] 11 comments
1. KMnO4+(OP)[view] [source] 2023-05-02 23:35:30
Very cool!

>“Today ChatGPT read 1289 top news and gave 13 of them a significance score over 6/10.”

Is an excellent hook.

I wish there was more of a basis on the score it chose. For example:

>“Russia suffers 100,000 casualties in Ukraine conflict, US estimates.” is #2 and ranked 6.8

>“White House estimates Russia has suffered 100,000 casualties in Ukraine since December.” is #237 and ranked 3.8.

replies(3): >>yakhin+u3 >>SV_Bub+sD >>jeegsy+ao1
2. yakhin+u3[view] [source] 2023-05-02 23:57:27
>>KMnO4+(OP)
Yeah, I'm still thinking about the scoring.

Initially I asked ChatGPT to estimate three things: event scale, event magnitude and event potential. That often resulted in clickbait articles going to the top.

To fix this I started to also ask it to estimate source credibility, so tabloids would get much lower score than, say New York Times.

Now you noticed another problem, similar articles get very different scores. I think ideally I could do some sort of deduplication, but I don't know how to implement it yet.

replies(3): >>csw-00+gh >>jrhizo+TA >>julien+YV
◧◩
3. csw-00+gh[view] [source] [discussion] 2023-05-03 01:56:38
>>yakhin+u3
How hard would it be to let users tinker with this? Like could I have a set of sliders to go play with (customize) my scoring?

Any chance AI could be used to dedup the stories (like these are identical - only show higher source)?

replies(1): >>yakhin+6j
◧◩◪
4. yakhin+6j[view] [source] [discussion] 2023-05-03 02:10:54
>>csw-00+gh
Oh, sliders for custom scoring is an amazing idea. And should be easy to add — I already have all the ChatGPT estimations for different parts of the score. Added to the TODO list.

The problem with deduping is that some news get posted and reposted by different sources for several days (sometimes even weeks) in a row. That's a huge context I'd have to put in AI.

1000 news titles * 3 days * 70 symbols per title = 210,000 symbols = 40000 words = 53000 tokens. My current context window is 8000 tokens and I think 32000 tokens is max that GPT-4 allows.

---

Add: now that I think about it should be possible to do in several runs. Will keep thinking about it, thanks for the suggestion.

◧◩
5. jrhizo+TA[view] [source] [discussion] 2023-05-03 04:24:21
>>yakhin+u3
Would getting a summary from ChatGPT with a prompt to eliminate clickbait/bias be helpful before evaluating the event?
replies(1): >>yakhin+nR
6. SV_Bub+sD[view] [source] 2023-05-03 04:54:20
>>KMnO4+(OP)
Agreed on the point, but your example brings up a different problem for me.

100,000 dead Russians, is very likely the propaganda number. Maybes it’s right; maybe it’s not, IDK, but I know that there has been zero negative Ukraine news since the conflict started (ghost of Kiev anyone?), and the media machine definitely only works one way on this topic.

I mean, Russia isn’t the good guy here, but I don’t excuse propaganda just because it tells a story I want to hear.

In this case, I don’t care what the US or Ukraine says the number is. That isn’t news. It’s narrative, true or not.

So, I feel like the score should take into account if how likely an article is to be narrative vs purely an event.

replies(1): >>amoss+WK
◧◩
7. amoss+WK[view] [source] [discussion] 2023-05-03 06:08:51
>>SV_Bub+sD
The estimate is 100 000 casualities (20k dead and 80k wounded), and it seems to be correlated with several different intelligence sources.

The number is news because Russia is waging a large-scale invasion of another country using cannon fodder tactics. Reporting on that invasion is not simply propaganda.

replies(1): >>afterb+HT
◧◩◪
8. yakhin+nR[view] [source] [discussion] 2023-05-03 07:00:51
>>jrhizo+TA
Maybe it's my low prompting skills, but I couldn't make ChatGPT give lower importance to tabloid articles when they claimed that the "world is ending" or something similar.

Every time ChatGPT saw the words "World is ending" (not real example) it gave those articles very high score.

Estimating source credibility was the only solution I came up with.

◧◩◪
9. afterb+HT[view] [source] [discussion] 2023-05-03 07:21:40
>>amoss+WK
> The estimate is 100 000 casualities (20k dead and 80k wounded)

Since December

◧◩
10. julien+YV[view] [source] [discussion] 2023-05-03 07:43:35
>>yakhin+u3
You could deduplicate based on vector similarity within a few days (7 days for big stories as their news cycle is longer, 24 or 48 hours for smaller stories). Idea: The number of stories within a cluster weighted by credibility of the source could be another element of the rating.
replies(1): >>yakhin+jY
◧◩◪
11. yakhin+jY[view] [source] [discussion] 2023-05-03 08:07:54
>>julien+YV
Wow, thank you! I'm just a frontend dev and know almost nothing about this. Will research vector similarity, sounds like the solution I need.
12. jeegsy+ao1[view] [source] 2023-05-03 11:32:24
>>KMnO4+(OP)
Maybe something to do with "US" vs "White House"
[go to top]