I feel like the more reasonable answer here is to just let the user take the latency hit. Surely requests being somewhat slower is preferable to requests being outright bitbucketed, right?
CF is bascically saying "we can know your IP but not the site you are trying to resolve" (that will know your IP anyway once you navigate there).
Whereas not loading at all looks like archive.is issue but is ultimately caused by archive.is.
> CF is bascically saying "we can know your IP but not the site you are trying to resolve" (that will know your IP anyway once you navigate there).
Not necessarily. For example, the DNS query could go straight to CF while the eventual request to archive.is goes through a proxy or VPN.
You don't need to make any such assumption; the above point stands even in the case of simply hitting the "wrong" (i.e. geographically suboptimal) CDN endpoint.
But for a site that essentially tries to serve you static content as quickly as possible and mostly all at once, that would probably introduce more overhead than it's worth.
I get that they don't want to "take the blame" but it seems like both parties are performing reasonable actions that butt heads but that one party resolves that by just not performing the service. To me that feels like a worse outcome than slow service, as it just looks like the site is down.
The next naive question I have is about the response of truncation. I understand Cloudflare is preserving privacy. Archive says that privacy is preserved because they truncate the PII. Is this truncation verifiable in the request from Cloudflare? If not, then this seems like an unreasonable expectation ("just trust me bro"). Again, personally I'd rather have the latency hit and I'm not sure I'm seeing a good argument against this.
True, but it's still the difference between being able to load all embedded resources from a server close to the user or potentially having to haul all of that across an ocean, considering TCP congestion window scaling (which is sensitive to round trip times) etc.
All that said, based on a purported comment by the maintainer of archive.is, the aim of their CDN is actually not improving responsivity, but delaying legal/law enforcement responses: >>36971650
> Archive says that privacy is preserved because they truncate the PII.
Personally, I don't have a lot of sympathy for either party here:
I think, especially given the comment linked above, Archive's latency/efficiency concerns are just pretext for quite different concerns of their own (having to deal with law enforcement).
And on the other hand, while Cloudflare's EDNS subnet truncation might help user privacy in a few edge cases (as many have said here, the visited site will get the user's IP as soon as they connect to their servers!), it also makes it that much harder for CDNs other than Cloudflare to efficiently serve content using DNS-based routing and forces them to also use Anycast, which is much harder to do.