zlacker

[parent] [thread] 21 comments
1. finnjo+(OP)[view] [source] 2023-05-03 08:58:18
I am European and see another US news oriented site which ironically is not "minimalist" from my perspective. It is (to me) littered with US domestic concerns. My point is that _minimalist_ is highly subjective and a pretty huge promise from a site.

But kudos to the effort and the idea of keeping news small is a most noble cause

replies(4): >>yakhin+21 >>SuoDua+vj >>ninken+3k >>yakhin+g01
2. yakhin+21[view] [source] 2023-05-03 09:04:48
>>finnjo+(OP)
Agree, as a Canadian, I also find that today too many US-centric news made the cut, but often it's not the case. Here's a recent issue covering different topics and not mentioning US at all: https://newsletter.newsminimalist.com/p/tuesday-april-25-3-m...
replies(1): >>Michae+Wv
3. SuoDua+vj[view] [source] 2023-05-03 11:24:08
>>finnjo+(OP)
A simple geographic bias for proximity would probably help with that - I'd love a curated news feed that has reporting on both major world events and news local to my hometown.
4. ninken+3k[view] [source] 2023-05-03 11:28:48
>>finnjo+(OP)
The problem is that .com is a de facto US TLD… as an American it doesn’t strike me as odd that a .com site is implicitly USA centric any more than a US news channel or US newspaper.

(Yes I know .us exists but it’s not as common as .com)

If it was newsminimalist.co.uk I don’t think anyone would really complain that it’s UK-specific news, right?

replies(3): >>wohfab+om >>lawn+cs >>yakhin+1Y
◧◩
5. wohfab+om[view] [source] [discussion] 2023-05-03 11:43:25
>>ninken+3k
Obviously it doesn't strike you as odd as an American.

I'd say, .com is just the default. If you go location/topic specific, you go with another TLD.

To the 130 million (domainnamestat.com) domains, registered in the US, there are like 500 million domains, registered somewhere else. I couldn't find any numbers for how many of those are .com and how many aren't, but you cannot just ignore those. Just because most domains registered in the US are .com domains, doesn't make .com a US domain. That's a really egocentric point of view.

replies(2): >>nunuvi+AG >>ninken+r91
◧◩
6. lawn+cs[view] [source] [discussion] 2023-05-03 12:24:10
>>ninken+3k
No, .com isn't implicitly USA centric, it's just the internet default. You mention .us yourself but then dismiss it because reasons.

Your American bias is just shining through, where of course you're not surprised.

replies(2): >>nunuvi+XD >>ninken+M31
◧◩
7. Michae+Wv[view] [source] [discussion] 2023-05-03 12:46:25
>>yakhin+21
Oddly enough in the 'least significant section', with scores <0.5, it's nearly all Daily Mail articles focused on the UK.

Also, the second to last least significant article seems to be incorrectly categorized: "Regenerative medicine has come a long way, baby"

Which is actually a serious look back at the advancements over the last quarter century, hardly deserving the second to last position.

It seems like ChatGPT is ranking them not based on actual content significance but presumed significance of the headline. (Which would also make sense technically as ~1200 headlines is about the max context length of GPT-4).

replies(1): >>yakhin+FJ
◧◩◪
8. nunuvi+XD[view] [source] [discussion] 2023-05-03 13:40:05
>>lawn+cs
> .com isn't implicitly USA centric, it's just the internet default

Those both mean the exact same thing in this context. Genuinely interesting that you're not the only person who understood it a different way and suggested "default" instead. Neither would stand up to such pedantic scrutiny if you want to argue that one of them is wrong.

In such a short sentence about a topic that everyone here surely knows about, the words only reference the relevant aspect of the underlying information. You can't know if they have the wrong or right view without more information.

◧◩◪
9. nunuvi+AG[view] [source] [discussion] 2023-05-03 13:52:43
>>wohfab+om
> I'd say, .com is just the default

That means the exact same thing in this context. Genuinely interesting that you're not the only person who understood it a different way and suggested "default" instead. Neither would stand up to such pedantic scrutiny if you want to argue that one of them is wrong.

In such a short sentence about a topic that everyone here surely knows about, the words only reference the relevant aspect of the underlying information. You can't know if they have the wrong or right view without more information.

◧◩◪
10. yakhin+FJ[view] [source] [discussion] 2023-05-03 14:10:05
>>Michae+Wv
Nice catch. Just checked that article — it actually got rating 2.8 just based on the news content, but the source credibility 1/10 brought it down to 0.3.

I don't think it's fair, I think ChatGPT hallucinated that it's a tabloid.

Not sure how to fix this. I don't want to adjust sources credibility manually, that will introduce too much bias. My hope is that OpenAI will update ChatGPT with newer data and I could rerun the credibility evaluation.

replies(2): >>Michae+wO >>starkp+i91
◧◩◪◨
11. Michae+wO[view] [source] [discussion] 2023-05-03 14:34:49
>>yakhin+FJ
Assuming an average of 20 tokens per headline (~10-14 words), 1200 headlines would be 24000 tokens, which is already near the limit of the API-exclusive GPT-4's window of 32,768 tokens, and way beyond the 8,192 token length of the ChatGPT version.

So it's exceedingly unlikely the actual content, beyond the headline, is processed if your using the ChatGPT version.

replies(1): >>yakhin+XT
◧◩◪◨⬒
12. yakhin+XT[view] [source] [discussion] 2023-05-03 15:03:49
>>Michae+wO
I score each article individually, so there's no need to put many news in one context window.

In 99% cases a single news article fits within the context.

I drop those that don't fit, since several examples I saw were announcement of lottery numbers (too many tokens) and articles with broken html.

replies(1): >>Michae+zj1
◧◩
13. yakhin+1Y[view] [source] [discussion] 2023-05-03 15:22:59
>>ninken+3k
I think it's because I only analyze news in English currently. I was thinking of adding other languages to the evaluation and ask GPT to translate them. It should at least partially fix the US bias.
14. yakhin+g01[view] [source] 2023-05-03 15:32:45
>>finnjo+(OP)
In the ChatGPT prompt I make emphasis on evaluating article from the perspective of humanity as a whole, not prioritizing any individual country. But still, it's not ideal.

I don't know ChatGPT's logic, but it might be it's giving higher scores to news about US economic difficulties because they tend to cause ripples all over the world. But I've never seen articles about US internal politics getting score over 6.5 (or maybe there were none in the last month).

◧◩◪
15. ninken+M31[view] [source] [discussion] 2023-05-03 15:50:29
>>lawn+cs

    $ whois com.
    % IANA WHOIS server
    % for more information on IANA, visit http://www.iana.org
    % This query returned 1 object

    domain:       COM

    organisation: VeriSign Global Registry Services
    address:      12061 Bluemont Way
    address:      Reston VA 20190
    address:      United States of America (the)

It absolutely is a US tld, run by a US corporation.

DNS was originally a DARPA project, and was implicitly US focused from the very beginning of the internet, because it was a US project. ".com" carries that legacy because it predates the concept of country-specific TLD's.

A similar idea exists in reddit: /r/news is very US-focused, even though it's not called "US news". Since Reddit is an American site with an (at least initially) predominantly American audience, it's not surprising at all that things are American-biased by default unless explicitly named accordingly.

If we were all using Minitel instead of The Internet, we would have similar bias where services would be biased towards France unless shown otherwise, because Minitel was a French technology.

The point is, it's not explicitly a problem that somebody puts up a website and it's US-centric. Nobody owes the world an international version of whatever project they want to make, and you don't need to get upset that they only bother to cater to a US audience.

◧◩◪◨
16. starkp+i91[view] [source] [discussion] 2023-05-03 16:16:08
>>yakhin+FJ
So how exactly is the credibility score determined? Is it just asking "On a scale of 0 to 10, how credible is this source?"
replies(1): >>yakhin+oT1
◧◩◪
17. ninken+r91[view] [source] [discussion] 2023-05-03 16:16:28
>>wohfab+om
.com is a US domain though. It's run by a US corporation (Verisign). Check whois.

That said corporation lets non-US entities buy domain names doesn't change this fact.

Similarly, if a site with a .tv address published articles explicitly about Tuvalu, in the Tuvaluan language, you probably wouldn't complain about a Tuvaluan bias, right? After all, it's a Tuvaluan TLD. The fact that lots of companies around the world use .tv addresses for other reasons that have nothing to do with Tuvalu, doesn't change the fact that it's a Tuvaluan TLD.

If you want more information on this, the intro in the Wikipedia article on .com is quite informative: https://en.wikipedia.org/wiki/.com ... particularly:

> The domain was originally administered by the United States Department of Defense, but is today operated by Verisign, and remains under ultimate jurisdiction of U.S. law.[2][3][4] Additionally, as the Internet was invented in the United States, most American businesses and enterprises have used the .com domain instead of a more U.S.-specific .us.

replies(1): >>detaro+sa1
◧◩◪◨
18. detaro+sa1[view] [source] [discussion] 2023-05-03 16:21:59
>>ninken+r91
Verisign is contracted to run .com

.tv addresses are different, because ccTLDs are explicitly tied to countries (hence "ccTLD"). generic TLDs (gTLDs) like .com are not.

replies(1): >>ninken+Wc1
◧◩◪◨⬒
19. ninken+Wc1[view] [source] [discussion] 2023-05-03 16:33:44
>>detaro+sa1
Huh? Contracted by who? DARPA originally invented the internet and DNS in the first place (it was originally an American project) and gave over control over .com to Network Solutions which was eventually bought by Verisign. What do you mean "contracted to run"?

.com registrations are ultimately under US jurisdiction. It's what happens when you have a name system that was originally intended for one country's project (The Internet) and said project ended up becoming internationally used. The original TLD's are grandfathered in even after we got country-specific ones.

Nobody complains that .mil is US-centric either. .mil isn't a ccTLD but of course it means US military.

.com is a very popular TLD used all over the world. It doesn't make it non-American. Just as .tv is also very popular outside of Tuvalu, it doesn't make it non-Tuvaluan.

◧◩◪◨⬒⬓
20. Michae+zj1[view] [source] [discussion] 2023-05-03 17:04:06
>>yakhin+XT
The score has to be in relation to other articles. Or else it's too random to have meaning. ChatGPT doesn't even given consistent scores from session-to-session for the same article.

And the context length limit prevents that relation from extending to more then a few articles, if that's your method.

i.e. Your method doesn't actually produce a meaningful score that can be ranked in some linear order with the 1200 other articles.

At most it would make sense to rank a discrete score in relation to the few other articles it remembers.

Anything beyond that should be placed in 'score ranges' from 5 to 7 for example, not given a discrete score.

replies(1): >>yakhin+BK1
◧◩◪◨⬒⬓⬔
21. yakhin+BK1[view] [source] [discussion] 2023-05-03 19:17:05
>>Michae+zj1
You are spot on. I use temperature 0, but even with it, ChatGPT can be unpredictable.

Sometimes I'm very frustrated about the news that get to the top. When I try to debug it, it gives me a completely different score.

I considered using ranges over discrete score, but dropped the idea, as it makes it too hard to find 1-5 articles that should make it to newsletter (there are 71 articles in this range right now) and it's hard to clearly display that idea in UI.

I guess my position right now is — it's not perfect, there are obvious errors (like the one you found above), and improvements are definitely possible.

But I hope that some people would find it "good enough" even with these inconsistencies. I also hope that ChatGPT or another LLM will make a big progress soon that would solve this problem automatically.

◧◩◪◨⬒
22. yakhin+oT1[view] [source] [discussion] 2023-05-03 20:05:47
>>starkp+i91
It's a bit longer, but that's the gist of it.

I just realized, for that particular news article about Regenerative medicine it was my mistake all along. I asked ChatGPT to give unknown sources a score of 1 and completely forgot about. I think that's what it did.

For now it marked only 8 sources as unknown out of 1700.

[go to top]