zlacker

[parent] [thread] 0 comments
1. helsin+(OP)[view] [source] 2023-07-08 06:16:15
> I notice they don't actually give a good reason that robots.txt isn't suitable

It's kind of implied: specifying sitemaps/allowance/copyright for different use cases: search, scraping, republishing, training etc. and perhaps adding some of the non standard extensions: Crawl-delay, default host, even sitemap isn't part of the robots.txt standard

> We believe it’s time for the web and AI communities to explore additional machine-readable means for web publisher choice and control for emerging AI and research use cases.

[go to top]