zlacker

Yeah, I'm still thinking about the scoring.

Initially I asked ChatGPT to estimate three things: event scale, event magnitude and event potential. That often resulted in clickbait articles going to the top.

To fix this I started to also ask it to estimate source credibility, so tabloids would get much lower score than, say New York Times.

Now you noticed another problem, similar articles get very different scores. I think ideally I could do some sort of deduplication, but I don't know how to implement it yet.

replies(3): >>csw-00+Md >>jrhizo+px >>julien+uS

>>yakhin+(OP)
How hard would it be to let users tinker with this? Like could I have a set of sliders to go play with (customize) my scoring?

Any chance AI could be used to dedup the stories (like these are identical - only show higher source)?

replies(1): >>yakhin+Cf

>>csw-00+Md
Oh, sliders for custom scoring is an amazing idea. And should be easy to add — I already have all the ChatGPT estimations for different parts of the score. Added to the TODO list.

The problem with deduping is that some news get posted and reposted by different sources for several days (sometimes even weeks) in a row. That's a huge context I'd have to put in AI.

1000 news titles * 3 days * 70 symbols per title = 210,000 symbols = 40000 words = 53000 tokens. My current context window is 8000 tokens and I think 32000 tokens is max that GPT-4 allows.

---

Add: now that I think about it should be possible to do in several runs. Will keep thinking about it, thanks for the suggestion.

>>yakhin+(OP)
Would getting a summary from ChatGPT with a prompt to eliminate clickbait/bias be helpful before evaluating the event?

replies(1): >>yakhin+TN

>>jrhizo+px
Maybe it's my low prompting skills, but I couldn't make ChatGPT give lower importance to tabloid articles when they claimed that the "world is ending" or something similar.

Every time ChatGPT saw the words "World is ending" (not real example) it gave those articles very high score.

Estimating source credibility was the only solution I came up with.

>>yakhin+(OP)
You could deduplicate based on vector similarity within a few days (7 days for big stories as their news cycle is longer, 24 or 48 hours for smaller stories). Idea: The number of stories within a cluster weighted by credibility of the source could be another element of the rating.

replies(1): >>yakhin+PU

>>julien+uS
Wow, thank you! I'm just a frontend dev and know almost nothing about this. Will research vector similarity, sounds like the solution I need.