The word-probabilities are transformative use, a form of fair use and aren't an issue.
The specific output at each point in time is what would be judged to be fair use or copyright infringing.
I'd argue the user would be responsible for ensuring they're not infringing by using the output in a copyright infringing manner i.e. for profit, as they've fed certain inputs into the model which led to the output. In the same way you can't sue Microsoft for someone typing up copyrighted works into Microsoft Word and then distributing for profit.
De minimus is still helpful here, not all infringments are noteworthy.
> Collateralised Copyright Liability
Is this a real legal / finance term or did you make it up?Also, I do not follow you leap to compare LLMs to CDOs (collateralised debt obligations). And, do you specifically mean CDO or any kind of mortgage / commercial loan structured finance deal?
The process involved in obtaining that end work is completely irrelevant to any copyright case. It can be a claim against the models weights (not possible as it's fair use), or it's against the specific once off output end work (less clear), but it can't be looked at as a whole.
https://www.nytimes.com/2023/12/27/business/media/new-york-t... https://www.reuters.com/legal/us-newspapers-sue-openai-copyr... https://www.washingtonpost.com/technology/2024/04/09/openai-...
Some decided to make deals instead
https://www.federalregister.gov/documents/2023/03/16/2023-05...
So I think the law, at least as currently interpreted, does care about the process.
Though maybe you meant as to whether a new work infringes existing copyright? As this guidance is clearly about new copyright.
What is the difference OpenAI has that lets them get away with, but not our hypothetical Mr. Smartass doing the same process trying to get around an NDA?
Who created the work, it's the user who instructed the AI (it's a tool), you can't attribute it to the AI. It would be the equivalent of Photoshop being attributed as co-author on your work.
The user is "inputting variables into their probability algorithm that's resulting in the copyright work".
It's not merely a compressed version of a song intended to be used in the same way as the original copyright work, this would be copyright infringement.
They tend to try argue for conspiracy to commit copyright infringement, it's a tenuous case to make unless they can prove that was actually their intention. I think in most cases it's ISP/hosting terms and conditions and legal costs that lead to their demise.
Your example of the model asking specifically "what copyrighted content would you like to download", kinda implies conspiracy to commit copyright infringement would be a valid charge.
> nobody could see what was inside CDOs
Absolutely not true. Where did you get that idea? When pricing the bonds from a CDO you get to see the initial collateral. As a bond owner, you receive monthly updates about any portfolio updates. Weirdly, CDOs frequently have more collateral transparency compared to commercial or residential mortgage deals.