While I'm very privacy conscious, I don't really see the benefit to hiding my region in the DNS request. Because the very next step after the DNS is my browser making a request to their webserver, at which time they will have my actual complete IP anyway.
Tell HN: Archive.is inaccessible via Cloudflare DNS (1.1.1.1) >>19828317
- This particular discussion includes a comment from Cloudflare's CEO, referenced in the article: >>19828702
Why does 1.1.1.1 not resolve archive.is? >>21155056
- StackExchange question, on the subject
Does Cloudflare's 1.1.1.1 DNS Block Archive.is? (2019) >>28495204
- Discussion of this blog post
asking the office DNS the same question i get 51.38.69.52, asking 9.9.9.9 it gives me the same IP as the office DNS. Finaly asking google or 8.8.8.8 i also get 51.38.69.52
So i think their record is borked.
> EDNS IP subsets can be used to better geolocate responses for services that use DNS-based load balancing. However, 1.1.1.1 is delivered across Cloudflare’s entire network that today spans 180 cities. We publish the geolocation information of the IPs that we query from. That allows any network with less density than we have to properly return DNS-targeted results.
Surfacing here for people who like to read comments before clicking through to the article.
TLDR: the site owner was returning wrong DNS responses for people using Cloudflare's 1.1.1.1 DNS service because the site owner doesn't like Cloudflare.
Nobody cares; the reality is if you use CF DNS, shit don't work.
archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion still works fine if you don't want to share your geolocation information with every DNS server in the recursive resolver chain.
Most relevant piece but the whole comment is worth a read:
> Archive.is’s authoritative DNS servers return bad results to 1.1.1.1 when we query them. I’ve proposed we just fix it on our end but our team, quite rightly, said that too would violate the integrity of DNS and the privacy and security promises we made to our users when we launched the service.
> The archive.is owner has explained that he returns bad results to us because we don’t pass along the EDNS subnet information. This information leaks information about a requester’s IP and, in turn, sacrifices the privacy of users.
Honestly it's that type of thing (the frankness, the presence on HN, willingness to participate, the principled stand on privacy) that got me into Cloudflare products. I now generate hundreds per month in revenue for them and that will likely be thousands in the next year or two. His time/effort on HN directly led to customer acquisition and revenue.
That said I do worry about the incentives Cloudflare has to their big customers. CF is a great tool for site owners, but like any tool has the potential to be a great evil (against the user) if the principles ever wane. It's already being used by a lot of sites to make life a living hell for people behind a VPN. As a site owner I absolutely get it: practically zero of my legitimate traffic comes from VPNs (our main demographic tend to skew older and much less technical than the average consumer), but all of the automated attacks against me do. Balancing freedom and rights is hard, but I deeply appreciate the thoughtfulness and principles that CF has displayed over the years.
On the other hand, it's possible this doesn't matter. The client might not encrypt the host it's trying to visit. Nation states can correlate packet timing. So if someone really wants to know, they'll probably figure it out. (This is always a risk with things like Tor. If the government is monitoring your connection and some target website's connection, and you are sending a lot of packets at the same time they're receiving a lot of packets, you can guess who is talking to who.)
I've switched to archive.org because archive.* is broken. For stuff that .org doesn't have, there's always the Tor version. The Tor address seems to be more responsive as well, so that's nice.
AFAICT this wouldn't "violate the integrity of DNS and the privacy and security promises we made to our users" and would solve a big pain point of using 1.1.1.1.
This is super misleading because it ignores the fact that archive.* goes out of their way to make CF DNS not work.
It's not just the website's DNS server that received your subnet information; it's every single location in the chain of DNS resolvers. That includes TLD servers run by data mining companies. Does Verisign need to know that 2001:2345:6789::abcd is looking for news.ycombinator.com?
With caching in place these methods of data gathering aren't all-encompassing, but if you visit some new or uncommon domains you'll be more likely to become part of the dataset.
Philosophically I think that lacks respect for the site owner and it would be wrong to deceive them and go against their wishes.
Pragmatically that sounds like a giant maintenance pain in the ass to manage, and not worth the time/money to make somebody's site work who actively doesn't want it to work.
When I use google dns, opendns or even my isp's own dns, the ip I get for example, googlevideo.com resolves to my isp's cache. But with cloudflare it gives my a google ip and I get no cache benefits.
Also, anecdotal evidence, uploading and downloading large files to Google Drive and some other websites is significantly faster when I use google dns or opendns. I'm not sure how that works, but maybe the server's returning an ip my isp can't route to in a fast way? My isp is notorious for having bad routes.
[1] >>36197401
[2] https://www.ietf.org/archive/id/draft-private-access-tokens-...
> There have been numerous attacks where people upload illegal content (childporn or isis propaganda) and immediately reported to the authorities near the IP of the archive. It resulted in ceased servers and downtimes. I just have no time to react. So I developed sort of CDN, with the only difference: DNS server returns not the closest IP to the request origin but the closest IP abroad, so any takedown procedure would require bureaucratic procedures so I am getting notified notified and have time to react.
> But CloudFlare DNS disrupts the scheme together with all other DNS-based CDNs Cloudflare is competing with and puts the archive existence on risk. I offered them to proxy those CloudFlare DNS's users via their CDN but they rejected. Registering my own autonomous system just to fix the issue with CloudFlare DNS is too expensive for me.
When I proposed using the DNS server's IP instead, they said:
> It did not work initially because they have global planetwide cache.
> 1. Someone resolves domain from Brazil.
> 2. Website's DNS get request from Cloudflare Brazil DC.
> 3. The result is replicated to other Cloudflare DCs
> 4. Some from Turkey resolves same domain and get the cached value
> It could be worked around by setting tiny TTL, which would slowly end up in consistent results, but... After "I’ve proposed we just fix it on our end .." all requests for 7 archive.* domains are sent from Symantec USA IP
With iCloud Private Relay, it sometimes works for me and sometimes doesn't depending on which CDN's system I'm using at the time.
So the main difference is that Cloudflare's servers need to be present in the IP geolocation database. Given their prevalence, they're probably in most of them already.
The other comments that only present the Cloudflare side of the situation make it sound like the archive.is owner was being unreasonable, but as we see there is more to it!
I personally tried to use 1.1.1.1 as my resolver a couple of years ago but I use archive.is a lot.
Regardless of who is “at fault”, not being able to access archive.is is a dealbreaker for me so I quickly stopped using 1.1.1.1
But Cloudflare has a lot of other things that work well for me.
[1] "Hello, Benedikt from Cloudflare and the Turnstile Team here. Thanks you so much for the report. We looked into this report and identified that there was some false positive and cleared the signal. We have investigated this report and the issue should be fixed. Please reach out to me benedikt@cloudflare.com or at our Cloudflare Turnstile Discord, if you are still encountering problems."
[2]
> Servers commonly use passive and persistent identifiers associated with clients, such as IP addresses or device identifiers, for enforcing access and usage policies. For example, a server might limit the amount of content an IP address can access over a given time period (referred to as a "metered paywall"), or a server might rate-limit access from an IP address to prevent fraud and abuse. Servers also commonly use the client's IP address as a strong indicator of the client's geographic location to limit access to services or content to a specific geographic area (referred to as "geofencing").
> However, passive and persistent client identifiers can be used by any entity that has access to it without the client's express consent. A server can use a client's IP address or its device identifier to track client activity. A client's IP address, and therefore its location, is visible to all entities on the path between the client and the server. These entities can trivially track a client, its location, and servers that the client visits.
> A client that wishes to keep its IP address private can hide its IP address using a proxy service or a VPN. However, doing so severely limits the client's ability to access services and content, since servers might not be able to enforce their policies without a stable and unique client identifier.
> This document describes an architecture for Private Access Tokens (PATs), using RSA Blind Signatures as defined in [BLINDSIG], as an explicit replacement for these passive client identifiers. These tokens are privately issued to clients upon request and then redeemed by servers in such a way that the issuance and redemption events for a given token are unlinkable.
I guess that still has the privacy implications.. but at least it would work!
I've tried raising this issue on their forum, where I've failed to get the attention of the engineering teams, and while posting the ray ID should be sufficient, all you'd really get is clueless, unpaid volunteers asking you questions in circles like "what website do you see this on" (everywhere), "are you using adblock" (no, and Adblock has never blocked their Turnstile scripts) and "what's your user agent?" (the default Chromium one).
If I had to hazard a guess, it's their bot management script seeing "Linux" in the user agent and detecting missing video codecs (which is par for the course for standard Chromium builds), and thinking it's a headless browser. Between the the fact that differences between the JS runtime of Chromium and Chromium headless are very small these days, and the ClientHello permutation has destroyed bot management vendors' ability to distinguish different browser builds, they decided blocking all Linux users using Chromium was fair enough.
I get that it's a frustrating situation but you're viewing CF in the worst possible light ("trying to lock out Linux users" assumes an intent not on display) and I think it's counterproductive to success.
Truly, just imagine the user story:
"I can't access this website."
"No worries! That is by design, because the protocol response returned to CF is illegal, and the server simply propagates the error back to you. It would be impure for CF to modify the response in-flight to fix this for you."
"?? I do not care. I am talking to CF, why can't CF just fix the issue?"
server=/archive.today/8.8.8.8
server=/archive.today/8.8.4.4
server=/archive.ph/8.8.8.8
server=/archive.ph/8.8.4.4
server=/archive.is/8.8.8.8
server=/archive.is/8.8.4.4
server=/archive.li/8.8.8.8
server=/archive.li/8.8.4.4
server=/archive.vn/8.8.8.8
server=/archive.vn/8.8.4.4
server=/archive.fo/8.8.8.8
server=/archive.fo/8.8.4.4
server=/archive.md/8.8.8.8
server=/archive.md/8.8.4.4
server=/archive.to/8.8.8.8
server=/archive.to/8.8.4.4
This way you use 1.1.1.1 for everything, except the domains listed above where it uses Google DNS instead.I am served IP addresses with relatively short TTLs (1-5minutes) that are in Moscow and North Holland.
It would be great if CF offered a choice, like Quad9.
[1] >>36972051
[2] >>36971869
Or you can try to not imagine everyone as hating everything and read the other comment in here posting the archive.* side of the story.
> There have been numerous attacks where people upload illegal content (childporn or isis propaganda) and immediately reported to the authorities near the IP of the archive. It resulted in ceased servers and downtimes. I just have no time to react. So I developed sort of CDN, with the only difference: DNS server returns not the closest IP to the request origin but the closest IP abroad, so any takedown procedure would require bureaucratic procedures so I am getting notified notified and have time to react.
> But CloudFlare DNS disrupts the scheme together with all other DNS-based CDNs Cloudflare is competing with and puts the archive existence on risk. I offered them to proxy those CloudFlare DNS's users via their CDN but they rejected. Registering my own autonomous system just to fix the issue with CloudFlare DNS is too expensive for me.
No, they go out of their way to make a system that can handle people trying to abuse it. Cloudflare doesn't like that system and refuses to help them.
> There have been numerous attacks where people upload illegal content (childporn or isis propaganda) and immediately reported to the authorities near the IP of the archive. It resulted in ceased servers and downtimes. I just have no time to react. So I developed sort of CDN, with the only difference: DNS server returns not the closest IP to the request origin but the closest IP abroad, so any takedown procedure would require bureaucratic procedures so I am getting notified notified and have time to react.
> But CloudFlare DNS disrupts the scheme together with all other DNS-based CDNs Cloudflare is competing with and puts the archive existence on risk. I offered them to proxy those CloudFlare DNS's users via their CDN but they rejected. Registering my own autonomous system just to fix the issue with CloudFlare DNS is too expensive for me.
(I have no doubt ypu are seeing the problem on your PC; but generalizing a single point to all Linux users just screams "technical incompetence" and makes me want to ignorw the post)
If that were true, there's a lot of really stupid people throwing away their money by paying CF to hack them.
That's a bit unfair, don't you think?
From what I remember of the saga, the original reason for Archive.is's block is that they run their own CDN, and by not knowing the location of the user, they can't determine the closest server to respond with.
edit: found source https://twitter.com/archiveis/status/1018691421182791680
So the alternative viewpoint is, that Cloudflare is being anti-competitive by technically preventing other CDN providers from working.
Disclosure: I'm a happy Cloudflare user, but all in all I think Archive.is service is far more fundamental for the internet (especially as it's 100% free!). So I would really appreciate if you could figure out a way of working together. Until then, 8.8.8.8 it is!
Google works but I want to keep a little distance from them. Also no big fan of Quad9. So what else to use and which is distributed World Wide?
You can run dnsmasq on anything that can run arbitrary services e.g. on your local workstation, or on your router if it is open enough, or even on something like Termux probably.
I feel like the more reasonable answer here is to just let the user take the latency hit. Surely requests being somewhat slower is preferable to requests being outright bitbucketed, right?
From what I understand, it's the opposite: Cloudflare doesn't use a relatively new feature of DNS (EDNS Client Subnet), and that site doesn't like the lack of that feature.
CF is bascically saying "we can know your IP but not the site you are trying to resolve" (that will know your IP anyway once you navigate there).
And any malicious client that tries to leak data via DNS can just ask for DNS record like my-ip-is-7.8.9.0.example.com and completely go around that privacy "enhancement".
Sorry but the "privacy" here looks like smokescreen to stifle competition.
┌
└─(12:32:59)──> nslookup archive.is 1.1.1.1
Server: 1.1.1.1
Address: 1.1.1.1#53
Non-authoritative answer:
Name: archive.is
Address: 89.253.237.217
┌
└─(12:38:12)──> nslookup archive.is 1.0.0.1
Server: 1.0.0.1
Address: 1.0.0.1#53
Non-authoritative answer:
Name: archive.is
Address: 89.253.237.217
Wonder why that is happening...[1] >>36971650
You as for a record, you get answer. You ask for IP adddress of archive.today, you get that IP
Then you connect to that IP
If your DNS doesn't leak client IP, the browser connecting to server IP will leak it.
It's entirely irrelevant protection that does nothing but makes competing on cdn harder.
This isn't true, because the request leaks the hostname in the handshake via SNI:
┌─(~/Projects/malware/triangulation)(ruby-2.5.0)────────────────────────────────────────────────────────────────────(c@c:s001)─┐
└─(12:32:59)──> nslookup archive.is 1.1.1.1 ──(Wed,Aug02)─┘
Server: 1.1.1.1
Address: 1.1.1.1#53
Non-authoritative answer:
Name: archive.is
Address: 89.253.237.217
┌─(~/Projects/malware/triangulation)(ruby-2.5.0)────────────────────────────────────────────────────────────────────(c@c:s001)─┐
└─(12:38:12)──> nslookup archive.is 1.0.0.1 ──(Wed,Aug02)─┘
Server: 1.0.0.1
Address: 1.0.0.1#53
Non-authoritative answer:
Name: archive.is
Address: 89.253.237.217
┌─(~/Projects/malware/triangulation)(ruby-2.5.0)────────────────────────────────────────────────────────────────────(c@c:s001)─┐
└─(12:38:14)──> whois 89.253.237.217 ──(Wed,Aug02)─┘
% IANA WHOIS server
% for more information on IANA, visit http://www.iana.org
% This query returned 1 object
refer: whois.ripe.net
inetnum: 89.0.0.0 - 89.255.255.255
organisation: RIPE NCC
status: ALLOCATED
whois: whois.ripe.net
changed: 2005-06
source: IANA
# whois.ripe.net
inetnum: 89.253.232.0 - 89.253.239.255
org: ORG-RL31-RIPE
netname: RU-RUSONYX-NET6
descr: Network for Rusonyx infrastructure
country: RU
mnt-lower: MNT-RUSONYX
mnt-routes: MNT-RUSONYX
admin-c: VZ1716-RIPE
admin-c: VZ1717-RIPE
tech-c: VZ1716-RIPE
status: ASSIGNED PA
mnt-by: MNT-RUSONYX
created: 2018-10-10T09:53:33Z
last-modified: 2018-10-16T12:37:40Z
source: RIPE # Filtered
organisation: ORG-RL31-RIPE
org-name: Rusonyx, Ltd.
country: RU
org-type: LIR
address: 5th st. Yamskogo Polya, 9, office 19
address: 125040
address: Moscow
address: RUSSIAN FEDERATION
phone: +74951370701
fax-no: +74951370701
mnt-ref: RIPE-NCC-HM-MNT
mnt-ref: MNT-RUSONYX
mnt-by: RIPE-NCC-HM-MNT
mnt-by: MNT-RUSONYX
abuse-c: AD11015-RIPE
created: 2006-08-18T09:59:51Z
last-modified: 2022-10-06T11:18:08Z
source: RIPE # Filtered
person: Viktor Zverkov
address: P.O. Box 19
address: 127137, Moscow, Russia
address: Rusonyx ltd.
phone: +7 495 5089959
nic-hdl: VZ1716-RIPE
mnt-by: MNT-RUSONYX
created: 2017-09-20T11:29:16Z
last-modified: 2022-07-05T14:00:10Z
source: RIPE
person: Viktor Zaytsev
address: P.O. Box 19 , Russia
address: 127137, Moscow
address: Rusonyx ltd.
phone: +7 495 5089959
nic-hdl: VZ1717-RIPE
mnt-by: MNT-RUSONYX
mnt-by: AM65535-MNT
created: 2017-09-20T11:54:54Z
last-modified: 2018-08-02T17:21:31Z
source: RIPE
% Information related to '89.253.232.0/21AS41535'
route: 89.253.232.0/21
descr: RUSONYX-RU
origin: AS41535
mnt-by: MNT-RUSONYX
created: 2017-11-24T09:34:37Z
last-modified: 2017-11-24T09:34:37Z
source: RIPE
% This query was served by the RIPE Database Query Service version 1.107 (DEXTER)
Not sure why this would be happening (looks like at least one other person in the thread is seeing same result).[1] >>36971650
Verizon wouldn't know that even with ECS, because ECS only needs to include the subnet prefix of the length that the client (Cloudflare's recursive resolver in this case) is willing to give out. There is no benefit and only harm to the client if it gives out the whole IP, and indeed it is called out as a bad idea in the ECS RFC.
In other words, you want the data, but prevent others from seeing your advantage?
This is what archive.is is doing, and you stomp your collective feet at.
> because we believe 1) privacy is a fundamental human right; and 2) the original sin of the Internet is that IP addresses are too closely tied to the identities of individuals and services.
If you cared about that, you wouldn't either block Tor or send us through captcha-hell just to pull a single webpage.
> Truncating EDNS is trying to honor #1 and overcome #2. So is our work on protocols like Oblivious DNS. This work, frankly, upsets some of our customers or potential customers (like Archive.is). But it’s the right thing to do for the long term health of the Internet.
'Upsets'? Wow. Talk about a "Rules for thee but not for me."
EDIT: This is outlined in [0], although it doesn't go into the depth I wish it did.
> By providing local Internet egress and by configuring internal DNS servers to provide local name resolution for Microsoft 365 endpoints, network traffic destined for Microsoft 365 can connect to Microsoft 365 front end servers as close as possible to the user.
[0] https://learn.microsoft.com/microsoft-365/enterprise/microso...
For an issue that pointed to cloudflare, but ultimately was our hoster having an issue with completing the TLS handshake...
After infra update ofc.
Tldr: had the opposite experience, for a technical issue :)
Or, thank you for wasting your customers time attempting to figure out why one or more sites aren't responding appropriately on your network while they work on other networks.
Not everyone is clued into EDNS or why archive.is doesn't function with CF.
CF is wasting everyone's time.
Archive.is' explanation is quoted in a comment below:
It still may not be the right decision, but it's important to frame the trade-off correctly.
I don't know; since their whole reason for being is to act as (a temporary?) archive of websites that would make them more vulnerable to these attacks than someone like ebay I'd think?
Works for me on Linux in Firefox and Chromium.
If another company did what Cloudflare does and homogenized tons of requests behind them, you can bet Cloudflare's CAPTCHA systems would block them in a second.
I have zero respect for Cloudflare's inability to answer criticisms about what they do, about their constant deflections from simple, straightforward questions, and the fact that they do to others what they would never accept anyone else doing to them. It's hypocrisy in the service of trying to become a monopoly by re-centralizing the Internet.
Don't believe me? Go ahead and look for examples of Matthew Prince addressing concerns that much of the non-western world can't access Cloudflare fronted sites because of Cloudflare's "reasons". When you don't find any that have more than just vague platitudes and handwaving, imagine how you'd feel if you were one of those multiple billion people.
I always found the funding of archive.is unknown. Who is behind it and why do they want this info. Why and how they can provide this for free is a big unknown to me.
I'm giving cf the benefit of the doubt against archive. At least I know cloudflare and this would be the first "doubt-moment"...
It's weird that others don't have this issue that much, I would have thought that CDN's would scream from everywhere for years already, if archive.is his statement is "complete".
Edit: cloudflare does not seem to block what's needed though.
> EDNS IP subsets can be used to better geolocate responses for services that use DNS-based load balancing. However, 1.1.1.1 is delivered across Cloudflare’s entire network that today spans 180 cities. We publish the geolocation information of the IPs that we query from. That allows any network with less density than we have to properly return DNS-targeted results.
In other words, Cloudflare expects us to think they're so special that they should get to do what they explicitly don't want others doing.
It's bullshit, particularly for all the people who are victims of Cloudflare's manipulations such as the default use of Cloudflare DNS servers for DNS-over-https on Firefox, which users were never asked about before it was enabled for them (at least in the US).
The issue isn't leaking your IP to archive.today. It's leaking your IP to any other dns servers along the way
Do you know how recursive DNS works?
Cloudflare sends the closest city from where the request was made which ought to be sufficient to optimize for a cdn I guess? Appears that archive.is doesn't have a locus standii in this debate...
> EDNS IP subsets can be used to better geolocate responses for services that use DNS-based load balancing. However, 1.1.1.1 is delivered across Cloudflare’s entire network that today spans 180 cities. We publish the geolocation information of the IPs that we query from. That allows any network with less density than we have to properly return DNS-targeted results.
0. https://medium.com/nextdns/how-we-made-dns-both-fast-and-pri...
"Yea.. thanks dude. Just.. drive the car, will ya?"
For example in unbound the defaults, when EDNS0 is enabled (disabled by default), are:
max-client-subnet-ipv6: 56
max-client-subnet-ipv4: 24
Forwarding can also be conditionally enabled for specific clients, upstream servers, specific zones, etc.ref: https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound...
Respect is optional too. But it is important.
I wonder if one party or the other actually made a change in response to this hitting the front page again?
EDIT: another comment, though somewhat hearsay, suggests that Cloudflare's caching could make this difficult to implement: >>36971650
I had that issue with cloudflare bot captcha when trying to access a web novel website. It would infinitely loop into the "please click the checkmark to confirm you're human" thingamagic.
Initially I thought it was because I was a linux user, but I tried to browse the same website on Google Chrome and the issue went away. They were not discriminating against linux, but against Firefox, which is just as bad, if not worse.
I tried everything on FF: deleting all cookies/storage/history, disabling addons etc. It would still do this on a pristine FF. Ultimately, I admit, not being able to access websites did manage to encourage me to uninstall FF, despite it not being FF's fault, I'm tired of dealing with this kind of cr*p.
While I'm here, I'd also like to layer Zero Trust and Warp+ so I can toggle my internal network while staying on Warp+.
Also, the separation in Zero Trust and tunnels between routed DNS names and private IPs is very confusing. Why do I need both?
Custom DNS entries for Zero Trust DNS would be nice, so I could point internal domains to the external routing without having to have public DNS, or even have the domains match.
I don't think archive.is blocks CF based on their IPs, so they must have some heuristic in place to defect bogus EDNS. Perhaps sometimes that heuristics fails?
>>all requests for 7 archive.* domains are sent from Symantec USA IP
It might be that the archive.is only lies to that IP, which would explain why many users in this thread say that archive.is does resolve correctly for them with 1.1.1.1
Whereas not loading at all looks like archive.is issue but is ultimately caused by archive.is.
> CF is bascically saying "we can know your IP but not the site you are trying to resolve" (that will know your IP anyway once you navigate there).
Not necessarily. For example, the DNS query could go straight to CF while the eventual request to archive.is goes through a proxy or VPN.
And given it is the subnet number being sent, NOT the IP address that people here claim, the privacy concern is fairly low (CF knows your IP address in order to deliver the DNS answer back to you and archive.is knows your IP address when you request resources).
I'll take the performance improvement that EDNS client subnet can provide.
It is quite expensive for an indie project. Not to mention legal support for compliance in every country of presence. To block 0.x% of visitors coming from CloudFlare is much cheaper for a small project than to go this road.
As I understand it, the main reasons people use archive.is over archive.org are because archive.is is more of an immediate proxy/cache/cdn, rather than a long-term archival system that requires a bot to crawl based on schedule parameters. That, and also it includes features to help bypass paywalls by sanitizing some (all?) JavaScript.
On the other hand, Archive.org doesn’t remove or alter scripts or anything like that. And as far as I know you can’t just request them to crawl a site and then browse it there immediately, but you can on Archive.is
It's actually really funny archive.is works from time to time on 1.1.1.1 which I'm assuming is when archive.is hasn't update their IP list / detection logic. I wonder how much time they spend maintaining that if they blocked everyone without EDNS it would be easy but since it's just Cloudflare....
Yes you can. After you put in the URL, you get a button to do so. I just did it for your comment: https://web.archive.org/web/20230802205505/https://news.ycom...
> It is quite expensive for an indie project. Not to mention legal support for compliance in every country of presence. To block 0.x% of visitors coming from CloudFlare is much cheaper for a small project than to go this road.
I don't buy this. I'm running my own AS and anycast services for £10pm (my ISP are sponsoring my allocations from RIPE).
Also, it feels like Cloudflare's DNS service is more than just 0.x% of the internet....?
I saw this post and tried it with and without Private Relay and sure enough, turning it on is the issue. Good to know....
Edit: I updated Private Relay to "Maintain general location" for IP Address Location Settings and archive.is loads fine.
Second Edit: Maybe not, it all works now and I think it is either session or cache. I got to play around with it
Archive.is believes that Cloudflare can simply provide the full EDNS data, and they're technically right. But Cloudflare won't budge because they believe this is hostile to user privacy. I haven't heard a counterargument that Cloudflare is wrong about this.
Cloudflare believes that Archive.is can simply live without the EDNS data, and they're technically right. But Archive.is won't budge because they believe it prevents their abuse prevention techniques. They mention that owning their own AS would solve the problem but that's too expensive.[1]
Blame is in the eye of the beholder, but it seems to me that Archive.is should find alternative abuse prevention techniques like other websites do. Cloudflare has an argument based on privacy. Archive.is has an argument based on the proper solution being too expensive. The expense of running an AS is disputed in this HN thread.[2]
[1] >>36971650
[2] >>36977654
You don't need to make any such assumption; the above point stands even in the case of simply hitting the "wrong" (i.e. geographically suboptimal) CDN endpoint.
Google and Facebook were examples of "the proper solutions".
The former is currently inaccessible from China, the latter from Russia.
Their "abuse prevention techniques" have failed.
Sacrificing only Cloudflare DNS users is a much lesser evil compared to outcome of "the proper solutions".
But for a site that essentially tries to serve you static content as quickly as possible and mostly all at once, that would probably introduce more overhead than it's worth.
IIRC WARP was only able to forward your origin IP to websites using Cloudflare. Then, as of Aug 2022, their FAQ[1] says your origin IP is hidden regardless of which website. Their IPs do reveal your geolocation though.
There was a bug[2] that revealed your IP to select websites; that seems to have been fixed by Nov 2022.
Disclaimer: I’m not knowledgeable enough to test every possible IP leak mechanism (like WebRTC), so I didn’t do that. I’m basically taking their word for it.
[1] https://developers.cloudflare.com/warp-client/known-issues-a...
[2] https://community.cloudflare.com/t/beware-cloudflare-warp-do...
I get that they don't want to "take the blame" but it seems like both parties are performing reasonable actions that butt heads but that one party resolves that by just not performing the service. To me that feels like a worse outcome than slow service, as it just looks like the site is down.
The next naive question I have is about the response of truncation. I understand Cloudflare is preserving privacy. Archive says that privacy is preserved because they truncate the PII. Is this truncation verifiable in the request from Cloudflare? If not, then this seems like an unreasonable expectation ("just trust me bro"). Again, personally I'd rather have the latency hit and I'm not sure I'm seeing a good argument against this.
archive.today - FAQ : https://archive.md/faq
archive.today - wiki : https://wiki.archiveteam.org/index.php/Archive.today
archiveteam wiki : https://wiki.archiveteam.org/
Tumblr : https://archive-is.tumblr.com/
Twitter : https://twitter.com/archiveis
archive.today
archive.ph
archive.is
archive.li
archive.vn
archive.fo
archive.md
archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion
Launched May 16, 2012; 11 years agoEvery element on the network between the user and the website will know it, too.
Your beliefs certainly aren't reflected in how you treat users. It's been a good while since I've been able to visit any cloudflare protected site using Tor. Your broken systems keep presenting me with infinite checkboxes that do absolutely nothing.
If you want to block people who truly believe that privacy is a fundamental human right, at least have the decency to be honest. Tell Tor users that they are permanently blocked so that they don't waste their time clicking on pointless checkboxes.
You're right, if you've got a legacy internet requirement then that adds another grand a year to your costs. But I disagree that it's "quite expensive for an indie project", especially one that's so popular it needs to run it's own CDN.
Your DNS server probably doesn't have the exact record for you at the ready, but it does know another DNS server that gets you closer to an answer. That's how recursive DNS works and it might happen a few times before you actually get to a result. With ECS now every DNS server in this chain knows 12.45.56.x wanted to visit hacker news.
True, but it's still the difference between being able to load all embedded resources from a server close to the user or potentially having to haul all of that across an ocean, considering TCP congestion window scaling (which is sensitive to round trip times) etc.
All that said, based on a purported comment by the maintainer of archive.is, the aim of their CDN is actually not improving responsivity, but delaying legal/law enforcement responses: >>36971650
> Archive says that privacy is preserved because they truncate the PII.
Personally, I don't have a lot of sympathy for either party here:
I think, especially given the comment linked above, Archive's latency/efficiency concerns are just pretext for quite different concerns of their own (having to deal with law enforcement).
And on the other hand, while Cloudflare's EDNS subnet truncation might help user privacy in a few edge cases (as many have said here, the visited site will get the user's IP as soon as they connect to their servers!), it also makes it that much harder for CDNs other than Cloudflare to efficiently serve content using DNS-based routing and forces them to also use Anycast, which is much harder to do.