zlacker

And therein lies the value of indexing huge amounts of data, which alphabet (google, youtube, etc.), Microsoft (bing, etc.), and similar companies have been doing for years now.

If it is legal to simply index a website, then why shouldn't it be legal to train a model in the very same data?

Of course, websites should have some option for declining data mining for ML/AI purposes, in the same way the can decline scraping/indexing in the robots.txt file.

But that ship has kind of sailed, unless the courts decide otherwise.