Most relevant piece but the whole comment is worth a read:
> Archive.is’s authoritative DNS servers return bad results to 1.1.1.1 when we query them. I’ve proposed we just fix it on our end but our team, quite rightly, said that too would violate the integrity of DNS and the privacy and security promises we made to our users when we launched the service.
> The archive.is owner has explained that he returns bad results to us because we don’t pass along the EDNS subnet information. This information leaks information about a requester’s IP and, in turn, sacrifices the privacy of users.
Honestly it's that type of thing (the frankness, the presence on HN, willingness to participate, the principled stand on privacy) that got me into Cloudflare products. I now generate hundreds per month in revenue for them and that will likely be thousands in the next year or two. His time/effort on HN directly led to customer acquisition and revenue.
That said I do worry about the incentives Cloudflare has to their big customers. CF is a great tool for site owners, but like any tool has the potential to be a great evil (against the user) if the principles ever wane. It's already being used by a lot of sites to make life a living hell for people behind a VPN. As a site owner I absolutely get it: practically zero of my legitimate traffic comes from VPNs (our main demographic tend to skew older and much less technical than the average consumer), but all of the automated attacks against me do. Balancing freedom and rights is hard, but I deeply appreciate the thoughtfulness and principles that CF has displayed over the years.
[1] >>36197401
[2] https://www.ietf.org/archive/id/draft-private-access-tokens-...
[1] "Hello, Benedikt from Cloudflare and the Turnstile Team here. Thanks you so much for the report. We looked into this report and identified that there was some false positive and cleared the signal. We have investigated this report and the issue should be fixed. Please reach out to me benedikt@cloudflare.com or at our Cloudflare Turnstile Discord, if you are still encountering problems."
[2]
> Servers commonly use passive and persistent identifiers associated with clients, such as IP addresses or device identifiers, for enforcing access and usage policies. For example, a server might limit the amount of content an IP address can access over a given time period (referred to as a "metered paywall"), or a server might rate-limit access from an IP address to prevent fraud and abuse. Servers also commonly use the client's IP address as a strong indicator of the client's geographic location to limit access to services or content to a specific geographic area (referred to as "geofencing").
> However, passive and persistent client identifiers can be used by any entity that has access to it without the client's express consent. A server can use a client's IP address or its device identifier to track client activity. A client's IP address, and therefore its location, is visible to all entities on the path between the client and the server. These entities can trivially track a client, its location, and servers that the client visits.
> A client that wishes to keep its IP address private can hide its IP address using a proxy service or a VPN. However, doing so severely limits the client's ability to access services and content, since servers might not be able to enforce their policies without a stable and unique client identifier.
> This document describes an architecture for Private Access Tokens (PATs), using RSA Blind Signatures as defined in [BLINDSIG], as an explicit replacement for these passive client identifiers. These tokens are privately issued to clients upon request and then redeemed by servers in such a way that the issuance and redemption events for a given token are unlinkable.
I've tried raising this issue on their forum, where I've failed to get the attention of the engineering teams, and while posting the ray ID should be sufficient, all you'd really get is clueless, unpaid volunteers asking you questions in circles like "what website do you see this on" (everywhere), "are you using adblock" (no, and Adblock has never blocked their Turnstile scripts) and "what's your user agent?" (the default Chromium one).
If I had to hazard a guess, it's their bot management script seeing "Linux" in the user agent and detecting missing video codecs (which is par for the course for standard Chromium builds), and thinking it's a headless browser. Between the the fact that differences between the JS runtime of Chromium and Chromium headless are very small these days, and the ClientHello permutation has destroyed bot management vendors' ability to distinguish different browser builds, they decided blocking all Linux users using Chromium was fair enough.
I get that it's a frustrating situation but you're viewing CF in the worst possible light ("trying to lock out Linux users" assumes an intent not on display) and I think it's counterproductive to success.
Truly, just imagine the user story:
"I can't access this website."
"No worries! That is by design, because the protocol response returned to CF is illegal, and the server simply propagates the error back to you. It would be impure for CF to modify the response in-flight to fix this for you."
"?? I do not care. I am talking to CF, why can't CF just fix the issue?"
It would be great if CF offered a choice, like Quad9.
[1] >>36972051
[2] >>36971869
(I have no doubt ypu are seeing the problem on your PC; but generalizing a single point to all Linux users just screams "technical incompetence" and makes me want to ignorw the post)
That's a bit unfair, don't you think?
From what I remember of the saga, the original reason for Archive.is's block is that they run their own CDN, and by not knowing the location of the user, they can't determine the closest server to respond with.
edit: found source https://twitter.com/archiveis/status/1018691421182791680
So the alternative viewpoint is, that Cloudflare is being anti-competitive by technically preventing other CDN providers from working.
Disclosure: I'm a happy Cloudflare user, but all in all I think Archive.is service is far more fundamental for the internet (especially as it's 100% free!). So I would really appreciate if you could figure out a way of working together. Until then, 8.8.8.8 it is!
I feel like the more reasonable answer here is to just let the user take the latency hit. Surely requests being somewhat slower is preferable to requests being outright bitbucketed, right?
CF is bascically saying "we can know your IP but not the site you are trying to resolve" (that will know your IP anyway once you navigate there).
And any malicious client that tries to leak data via DNS can just ask for DNS record like my-ip-is-7.8.9.0.example.com and completely go around that privacy "enhancement".
Sorry but the "privacy" here looks like smokescreen to stifle competition.
[1] >>36971650
In other words, you want the data, but prevent others from seeing your advantage?
This is what archive.is is doing, and you stomp your collective feet at.
> because we believe 1) privacy is a fundamental human right; and 2) the original sin of the Internet is that IP addresses are too closely tied to the identities of individuals and services.
If you cared about that, you wouldn't either block Tor or send us through captcha-hell just to pull a single webpage.
> Truncating EDNS is trying to honor #1 and overcome #2. So is our work on protocols like Oblivious DNS. This work, frankly, upsets some of our customers or potential customers (like Archive.is). But it’s the right thing to do for the long term health of the Internet.
'Upsets'? Wow. Talk about a "Rules for thee but not for me."
EDIT: This is outlined in [0], although it doesn't go into the depth I wish it did.
> By providing local Internet egress and by configuring internal DNS servers to provide local name resolution for Microsoft 365 endpoints, network traffic destined for Microsoft 365 can connect to Microsoft 365 front end servers as close as possible to the user.
[0] https://learn.microsoft.com/microsoft-365/enterprise/microso...
For an issue that pointed to cloudflare, but ultimately was our hoster having an issue with completing the TLS handshake...
After infra update ofc.
Tldr: had the opposite experience, for a technical issue :)
Or, thank you for wasting your customers time attempting to figure out why one or more sites aren't responding appropriately on your network while they work on other networks.
Not everyone is clued into EDNS or why archive.is doesn't function with CF.
CF is wasting everyone's time.
Archive.is' explanation is quoted in a comment below:
It still may not be the right decision, but it's important to frame the trade-off correctly.
Works for me on Linux in Firefox and Chromium.
If another company did what Cloudflare does and homogenized tons of requests behind them, you can bet Cloudflare's CAPTCHA systems would block them in a second.
I have zero respect for Cloudflare's inability to answer criticisms about what they do, about their constant deflections from simple, straightforward questions, and the fact that they do to others what they would never accept anyone else doing to them. It's hypocrisy in the service of trying to become a monopoly by re-centralizing the Internet.
Don't believe me? Go ahead and look for examples of Matthew Prince addressing concerns that much of the non-western world can't access Cloudflare fronted sites because of Cloudflare's "reasons". When you don't find any that have more than just vague platitudes and handwaving, imagine how you'd feel if you were one of those multiple billion people.
In other words, Cloudflare expects us to think they're so special that they should get to do what they explicitly don't want others doing.
It's bullshit, particularly for all the people who are victims of Cloudflare's manipulations such as the default use of Cloudflare DNS servers for DNS-over-https on Firefox, which users were never asked about before it was enabled for them (at least in the US).
Respect is optional too. But it is important.
EDIT: another comment, though somewhat hearsay, suggests that Cloudflare's caching could make this difficult to implement: >>36971650
I had that issue with cloudflare bot captcha when trying to access a web novel website. It would infinitely loop into the "please click the checkmark to confirm you're human" thingamagic.
Initially I thought it was because I was a linux user, but I tried to browse the same website on Google Chrome and the issue went away. They were not discriminating against linux, but against Firefox, which is just as bad, if not worse.
I tried everything on FF: deleting all cookies/storage/history, disabling addons etc. It would still do this on a pristine FF. Ultimately, I admit, not being able to access websites did manage to encourage me to uninstall FF, despite it not being FF's fault, I'm tired of dealing with this kind of cr*p.
While I'm here, I'd also like to layer Zero Trust and Warp+ so I can toggle my internal network while staying on Warp+.
Also, the separation in Zero Trust and tunnels between routed DNS names and private IPs is very confusing. Why do I need both?
Custom DNS entries for Zero Trust DNS would be nice, so I could point internal domains to the external routing without having to have public DNS, or even have the domains match.
Whereas not loading at all looks like archive.is issue but is ultimately caused by archive.is.
> CF is bascically saying "we can know your IP but not the site you are trying to resolve" (that will know your IP anyway once you navigate there).
Not necessarily. For example, the DNS query could go straight to CF while the eventual request to archive.is goes through a proxy or VPN.
And given it is the subnet number being sent, NOT the IP address that people here claim, the privacy concern is fairly low (CF knows your IP address in order to deliver the DNS answer back to you and archive.is knows your IP address when you request resources).
I'll take the performance improvement that EDNS client subnet can provide.
It is quite expensive for an indie project. Not to mention legal support for compliance in every country of presence. To block 0.x% of visitors coming from CloudFlare is much cheaper for a small project than to go this road.
It's actually really funny archive.is works from time to time on 1.1.1.1 which I'm assuming is when archive.is hasn't update their IP list / detection logic. I wonder how much time they spend maintaining that if they blocked everyone without EDNS it would be easy but since it's just Cloudflare....
> It is quite expensive for an indie project. Not to mention legal support for compliance in every country of presence. To block 0.x% of visitors coming from CloudFlare is much cheaper for a small project than to go this road.
I don't buy this. I'm running my own AS and anycast services for £10pm (my ISP are sponsoring my allocations from RIPE).
Also, it feels like Cloudflare's DNS service is more than just 0.x% of the internet....?
You don't need to make any such assumption; the above point stands even in the case of simply hitting the "wrong" (i.e. geographically suboptimal) CDN endpoint.
But for a site that essentially tries to serve you static content as quickly as possible and mostly all at once, that would probably introduce more overhead than it's worth.
IIRC WARP was only able to forward your origin IP to websites using Cloudflare. Then, as of Aug 2022, their FAQ[1] says your origin IP is hidden regardless of which website. Their IPs do reveal your geolocation though.
There was a bug[2] that revealed your IP to select websites; that seems to have been fixed by Nov 2022.
Disclaimer: I’m not knowledgeable enough to test every possible IP leak mechanism (like WebRTC), so I didn’t do that. I’m basically taking their word for it.
[1] https://developers.cloudflare.com/warp-client/known-issues-a...
[2] https://community.cloudflare.com/t/beware-cloudflare-warp-do...
I get that they don't want to "take the blame" but it seems like both parties are performing reasonable actions that butt heads but that one party resolves that by just not performing the service. To me that feels like a worse outcome than slow service, as it just looks like the site is down.
The next naive question I have is about the response of truncation. I understand Cloudflare is preserving privacy. Archive says that privacy is preserved because they truncate the PII. Is this truncation verifiable in the request from Cloudflare? If not, then this seems like an unreasonable expectation ("just trust me bro"). Again, personally I'd rather have the latency hit and I'm not sure I'm seeing a good argument against this.
Every element on the network between the user and the website will know it, too.
Your beliefs certainly aren't reflected in how you treat users. It's been a good while since I've been able to visit any cloudflare protected site using Tor. Your broken systems keep presenting me with infinite checkboxes that do absolutely nothing.
If you want to block people who truly believe that privacy is a fundamental human right, at least have the decency to be honest. Tell Tor users that they are permanently blocked so that they don't waste their time clicking on pointless checkboxes.
You're right, if you've got a legacy internet requirement then that adds another grand a year to your costs. But I disagree that it's "quite expensive for an indie project", especially one that's so popular it needs to run it's own CDN.
True, but it's still the difference between being able to load all embedded resources from a server close to the user or potentially having to haul all of that across an ocean, considering TCP congestion window scaling (which is sensitive to round trip times) etc.
All that said, based on a purported comment by the maintainer of archive.is, the aim of their CDN is actually not improving responsivity, but delaying legal/law enforcement responses: >>36971650
> Archive says that privacy is preserved because they truncate the PII.
Personally, I don't have a lot of sympathy for either party here:
I think, especially given the comment linked above, Archive's latency/efficiency concerns are just pretext for quite different concerns of their own (having to deal with law enforcement).
And on the other hand, while Cloudflare's EDNS subnet truncation might help user privacy in a few edge cases (as many have said here, the visited site will get the user's IP as soon as they connect to their servers!), it also makes it that much harder for CDNs other than Cloudflare to efficiently serve content using DNS-based routing and forces them to also use Anycast, which is much harder to do.