But kudos to the effort and the idea of keeping news small is a most noble cause
(Yes I know .us exists but it’s not as common as .com)
If it was newsminimalist.co.uk I don’t think anyone would really complain that it’s UK-specific news, right?
I'd say, .com is just the default. If you go location/topic specific, you go with another TLD.
To the 130 million (domainnamestat.com) domains, registered in the US, there are like 500 million domains, registered somewhere else. I couldn't find any numbers for how many of those are .com and how many aren't, but you cannot just ignore those. Just because most domains registered in the US are .com domains, doesn't make .com a US domain. That's a really egocentric point of view.
Your American bias is just shining through, where of course you're not surprised.
Also, the second to last least significant article seems to be incorrectly categorized: "Regenerative medicine has come a long way, baby"
Which is actually a serious look back at the advancements over the last quarter century, hardly deserving the second to last position.
It seems like ChatGPT is ranking them not based on actual content significance but presumed significance of the headline. (Which would also make sense technically as ~1200 headlines is about the max context length of GPT-4).
Those both mean the exact same thing in this context. Genuinely interesting that you're not the only person who understood it a different way and suggested "default" instead. Neither would stand up to such pedantic scrutiny if you want to argue that one of them is wrong.
In such a short sentence about a topic that everyone here surely knows about, the words only reference the relevant aspect of the underlying information. You can't know if they have the wrong or right view without more information.
That means the exact same thing in this context. Genuinely interesting that you're not the only person who understood it a different way and suggested "default" instead. Neither would stand up to such pedantic scrutiny if you want to argue that one of them is wrong.
In such a short sentence about a topic that everyone here surely knows about, the words only reference the relevant aspect of the underlying information. You can't know if they have the wrong or right view without more information.
I don't think it's fair, I think ChatGPT hallucinated that it's a tabloid.
Not sure how to fix this. I don't want to adjust sources credibility manually, that will introduce too much bias. My hope is that OpenAI will update ChatGPT with newer data and I could rerun the credibility evaluation.
So it's exceedingly unlikely the actual content, beyond the headline, is processed if your using the ChatGPT version.
In 99% cases a single news article fits within the context.
I drop those that don't fit, since several examples I saw were announcement of lottery numbers (too many tokens) and articles with broken html.
I don't know ChatGPT's logic, but it might be it's giving higher scores to news about US economic difficulties because they tend to cause ripples all over the world. But I've never seen articles about US internal politics getting score over 6.5 (or maybe there were none in the last month).
$ whois com.
% IANA WHOIS server
% for more information on IANA, visit http://www.iana.org
% This query returned 1 object
domain: COM
organisation: VeriSign Global Registry Services
address: 12061 Bluemont Way
address: Reston VA 20190
address: United States of America (the)
It absolutely is a US tld, run by a US corporation.DNS was originally a DARPA project, and was implicitly US focused from the very beginning of the internet, because it was a US project. ".com" carries that legacy because it predates the concept of country-specific TLD's.
A similar idea exists in reddit: /r/news is very US-focused, even though it's not called "US news". Since Reddit is an American site with an (at least initially) predominantly American audience, it's not surprising at all that things are American-biased by default unless explicitly named accordingly.
If we were all using Minitel instead of The Internet, we would have similar bias where services would be biased towards France unless shown otherwise, because Minitel was a French technology.
The point is, it's not explicitly a problem that somebody puts up a website and it's US-centric. Nobody owes the world an international version of whatever project they want to make, and you don't need to get upset that they only bother to cater to a US audience.
That said corporation lets non-US entities buy domain names doesn't change this fact.
Similarly, if a site with a .tv address published articles explicitly about Tuvalu, in the Tuvaluan language, you probably wouldn't complain about a Tuvaluan bias, right? After all, it's a Tuvaluan TLD. The fact that lots of companies around the world use .tv addresses for other reasons that have nothing to do with Tuvalu, doesn't change the fact that it's a Tuvaluan TLD.
If you want more information on this, the intro in the Wikipedia article on .com is quite informative: https://en.wikipedia.org/wiki/.com ... particularly:
> The domain was originally administered by the United States Department of Defense, but is today operated by Verisign, and remains under ultimate jurisdiction of U.S. law.[2][3][4] Additionally, as the Internet was invented in the United States, most American businesses and enterprises have used the .com domain instead of a more U.S.-specific .us.
.tv addresses are different, because ccTLDs are explicitly tied to countries (hence "ccTLD"). generic TLDs (gTLDs) like .com are not.
.com registrations are ultimately under US jurisdiction. It's what happens when you have a name system that was originally intended for one country's project (The Internet) and said project ended up becoming internationally used. The original TLD's are grandfathered in even after we got country-specific ones.
Nobody complains that .mil is US-centric either. .mil isn't a ccTLD but of course it means US military.
.com is a very popular TLD used all over the world. It doesn't make it non-American. Just as .tv is also very popular outside of Tuvalu, it doesn't make it non-Tuvaluan.
And the context length limit prevents that relation from extending to more then a few articles, if that's your method.
i.e. Your method doesn't actually produce a meaningful score that can be ranked in some linear order with the 1200 other articles.
At most it would make sense to rank a discrete score in relation to the few other articles it remembers.
Anything beyond that should be placed in 'score ranges' from 5 to 7 for example, not given a discrete score.
Sometimes I'm very frustrated about the news that get to the top. When I try to debug it, it gives me a completely different score.
I considered using ranges over discrete score, but dropped the idea, as it makes it too hard to find 1-5 articles that should make it to newsletter (there are 71 articles in this range right now) and it's hard to clearly display that idea in UI.
I guess my position right now is — it's not perfect, there are obvious errors (like the one you found above), and improvements are definitely possible.
But I hope that some people would find it "good enough" even with these inconsistencies. I also hope that ChatGPT or another LLM will make a big progress soon that would solve this problem automatically.
I just realized, for that particular news article about Regenerative medicine it was my mistake all along. I asked ChatGPT to give unknown sources a score of 1 and completely forgot about. I think that's what it did.
For now it marked only 8 sources as unknown out of 1700.