zlacker

Nice catch. Just checked that article — it actually got rating 2.8 just based on the news content, but the source credibility 1/10 brought it down to 0.3.

I don't think it's fair, I think ChatGPT hallucinated that it's a tabloid.

Not sure how to fix this. I don't want to adjust sources credibility manually, that will introduce too much bias. My hope is that OpenAI will update ChatGPT with newer data and I could rerun the credibility evaluation.

replies(2): >>Michae+R4 >>starkp+Dp

>>yakhin+(OP)
Assuming an average of 20 tokens per headline (~10-14 words), 1200 headlines would be 24000 tokens, which is already near the limit of the API-exclusive GPT-4's window of 32,768 tokens, and way beyond the 8,192 token length of the ChatGPT version.

So it's exceedingly unlikely the actual content, beyond the headline, is processed if your using the ChatGPT version.

replies(1): >>yakhin+ia

>>Michae+R4
I score each article individually, so there's no need to put many news in one context window.

In 99% cases a single news article fits within the context.

I drop those that don't fit, since several examples I saw were announcement of lottery numbers (too many tokens) and articles with broken html.

replies(1): >>Michae+Uz

>>yakhin+(OP)
So how exactly is the credibility score determined? Is it just asking "On a scale of 0 to 10, how credible is this source?"

replies(1): >>yakhin+J91

>>yakhin+ia
The score has to be in relation to other articles. Or else it's too random to have meaning. ChatGPT doesn't even given consistent scores from session-to-session for the same article.

And the context length limit prevents that relation from extending to more then a few articles, if that's your method.

i.e. Your method doesn't actually produce a meaningful score that can be ranked in some linear order with the 1200 other articles.

At most it would make sense to rank a discrete score in relation to the few other articles it remembers.

Anything beyond that should be placed in 'score ranges' from 5 to 7 for example, not given a discrete score.

replies(1): >>yakhin+W01

>>Michae+Uz
You are spot on. I use temperature 0, but even with it, ChatGPT can be unpredictable.

Sometimes I'm very frustrated about the news that get to the top. When I try to debug it, it gives me a completely different score.

I considered using ranges over discrete score, but dropped the idea, as it makes it too hard to find 1-5 articles that should make it to newsletter (there are 71 articles in this range right now) and it's hard to clearly display that idea in UI.

I guess my position right now is — it's not perfect, there are obvious errors (like the one you found above), and improvements are definitely possible.

But I hope that some people would find it "good enough" even with these inconsistencies. I also hope that ChatGPT or another LLM will make a big progress soon that would solve this problem automatically.

>>starkp+Dp
It's a bit longer, but that's the gist of it.

I just realized, for that particular news article about Regenerative medicine it was my mistake all along. I asked ChatGPT to give unknown sources a score of 1 and completely forgot about. I think that's what it did.

For now it marked only 8 sources as unknown out of 1700.