zlacker

[parent] [thread] 9 comments
1. userbi+(OP)[view] [source] 2023-07-25 00:02:10
Any archive site (archive.org, archive.ph, etc.) can be blocked by sites requiring attestation.

What will happen if such a thing actually happens is that the underground market for "trusted device" farms grows, not too different from what's currently already happening but possibly at a far larger scale. Of course, that means the financially motivated scraping services still keep going while the honest individuals wanting user-agent freedom get screwed, just like with many other forms of DRM...

replies(2): >>wrapti+gi >>jaflo+yi
2. wrapti+gi[view] [source] 2023-07-25 02:17:47
>>userbi+(OP)
This has been happening already. The market is trying really hard to price out web scraping through scraper detection technologies and it's kinda working - scraping is becoming non-existent in user-space apps. It's also extremely discriminatory. Try running a single scrape with a developing country's IP and Linux, you'll be blocked at TLS step lol
replies(2): >>CalRob+tA >>altfre+tS
3. jaflo+yi[view] [source] 2023-07-25 02:20:33
>>userbi+(OP)
Basically the captcha solving industry.
◧◩
4. CalRob+tA[view] [source] [discussion] 2023-07-25 05:12:24
>>wrapti+gi
But of course search engines are fine
replies(1): >>wrapti+iQ
◧◩◪
5. wrapti+iQ[view] [source] [discussion] 2023-07-25 07:36:25
>>CalRob+tA
Having your cake and eating it too is a natural goal of every business and honestly it was just a matter of time till web pages figured out they can have the benefits of public data and avoid the costs. Web scraping and botting is basically a solved problem too - just put a login gate for the data which allows you to legally litigate against scrapers and bots. Done. However, nobody wants to lose the benefits of public data so here we are.
replies(1): >>CalRob+yg1
◧◩
6. altfre+tS[view] [source] [discussion] 2023-07-25 07:54:59
>>wrapti+gi
> The market is trying really hard to price out web scraping... scraping is becoming non-existent in user-space apps

Uhh... Those two matters are pretty much unrelated to each other. Scraping is becoming non-existing because the era of static web pages has ended. No need to "scrap" when you have a nice, performant JSON REST API provided for you.

replies(2): >>flagra+xl1 >>wrapti+fd4
◧◩◪◨
7. CalRob+yg1[view] [source] [discussion] 2023-07-25 11:34:28
>>wrapti+iQ
I used to care about respecting robots.txt until it was clear that established search engines are fine but any newcomers can go right to hell.
◧◩◪
8. flagra+xl1[view] [source] [discussion] 2023-07-25 12:10:24
>>altfre+tS
SSG vs SSR really has nothing to do with whether an API exists to provide the data you would otherwise need to scrape.

When was the last time you saw a site with a JSON API providing metadata, like the json-ld for a product on an e-commerce site? Or an API just for the open graph data? How would you even discover these APIs for sites that you don't own?

It's also worth noting that very, very few JSON APIs today are actually REST. They rarely include all the context needed, and in general JSON is much less useful than XML when you're talking to other APIs that you don't own since JSON can't easily describe the shape and datatypes of the content.

◧◩◪
9. wrapti+fd4[view] [source] [discussion] 2023-07-26 01:38:20
>>altfre+tS
> No need to "scrap" when you have a nice, performant JSON REST API provided for you.

There are no performant json rest APIs provided these days though. The days of public APIs are long gone.

replies(1): >>altfre+4v6
◧◩◪◨
10. altfre+4v6[view] [source] [discussion] 2023-07-26 16:57:38
>>wrapti+fd4
HTML "APIs" weren't meant for public either.

In practice, if there is a mobile app, there is an API. Whether it's creators object to your usage is mostly their own problem.

[go to top]