Does this imply that distributing open-weights models such as Llama is copyright infringement, since users can trivially run the model without output filtering to extract the memorized text?
[1]: https://storage.courtlistener.com/recap/gov.uscourts.cand.43...
The goal of copyright is to make sure people can get fair compensation for the amount of work they put in. LLMs automate plagiarism on a previously unfathomable scale.
If humans spend a trillion hours writing books, articles, blog posts and code, then somebody (a small group of people) comes and spends a million hours building a machine that ingests all the previous work and produces output based on it, who should get the reward for the work put in?
The original authors together spent a million times more effort (normalized for skill) and should therefore should get a million times bigger reward than those who build the machine.
In other words, if the small group sells access to the product of the combined effort, they only deserve a millionth of the income.
---
If "AI" is as transformative as they claim, they will have no trouble making so much money they they can fairly compensate the original authors while still earning a decent profit. But if it's not, then it's just an overpriced plagiarism automator and their reluctance to acknowledge they are making money on top of everyone else's work is indicative.