zlacker

Robot.txt isn't about copyrights, its about preventing bots. Its effectively a EULA. Copyright law only goes into effect when you distribute the content you scrape. If you scraped New York times for your own LLM that you used internally and didn't distribute the results, there would be no copyright infringement.

replies(2): >>sam_lo+m5 >>oldgra+xX

>>adrr+(OP)
Er... This is what all these lawsuits against LLMs are hoping to disprove

replies(1): >>jeremy+Yq

>>sam_lo+m5
Which lawsuits are concerning LLMs used only privately by the organization that developed it?

>>adrr+(OP)
> If you scraped New York times for your own LLM that you used internally and didn't distribute the results, there would be no copyright infringement.

Why?

As far as I understand, the copyright owner has control of all copying, regardless of whether it is done internally or externally. Distributing it externally would be a more serious vilation, though.