Google’s nightmare “Web Integrity API” wants a DRM gatekeeper for the web

>>jakobd+(OP)
I've been thinking about this for a few days but just realized that this is a complete end run around all web scraping in general.

All 'adversarial compatibility' from projects like Nitter, Teddit, Invidious, and youtube-dl go out the window. Any archive site (archive.org, archive.ph, etc.) can be blocked by sites requiring attestation.

And just like the book industry was terrified of piracy and were 'rescued' by Kindle, so too will journalism outlets that can't find a business model flock to Google to save them.

This is going to be rough.

>>rpdill+sl
Any archive site (archive.org, archive.ph, etc.) can be blocked by sites requiring attestation.

What will happen if such a thing actually happens is that the underground market for "trusted device" farms grows, not too different from what's currently already happening but possibly at a far larger scale. Of course, that means the financially motivated scraping services still keep going while the honest individuals wanting user-agent freedom get screwed, just like with many other forms of DRM...

>>userbi+dt
This has been happening already. The market is trying really hard to price out web scraping through scraper detection technologies and it's kinda working - scraping is becoming non-existent in user-space apps. It's also extremely discriminatory. Try running a single scrape with a developing country's IP and Linux, you'll be blocked at TLS step lol

>>wrapti+tL
> The market is trying really hard to price out web scraping... scraping is becoming non-existent in user-space apps

Uhh... Those two matters are pretty much unrelated to each other. Scraping is becoming non-existing because the era of static web pages has ended. No need to "scrap" when you have a nice, performant JSON REST API provided for you.

>>altfre+Gl1
SSG vs SSR really has nothing to do with whether an API exists to provide the data you would otherwise need to scrape.

When was the last time you saw a site with a JSON API providing metadata, like the json-ld for a product on an e-commerce site? Or an API just for the open graph data? How would you even discover these APIs for sites that you don't own?

It's also worth noting that very, very few JSON APIs today are actually REST. They rarely include all the context needed, and in general JSON is much less useful than XML when you're talking to other APIs that you don't own since JSON can't easily describe the shape and datatypes of the content.

zlacker