zlacker

The law on this does not currently exist. It is in the process of being created by the courts and legistatures.

I personally think that giving copyright holders control over who is legally allowed to view a work that has been made publicly available is a huge step in the wrong direction. One of those reasons is open source, but really that argument applies just as well to making sure that smaller companies have a chance of competing.

I think it makes much more sense to go after the infringing uses of models rather than putting in another barrier that will further advantage the big players in this space.

replies(3): >>jrajav+K1 >>rndmwl+A6 >>Captai+Wd

>>shkkmo+(OP)
Copyright holders already have control over who is legally allowed to view a work that has been made publicly available. It's the right to distribution. You don't waive that right when you make your content free to view on a trial basis to visitors to your site, with the intent of getting subscriptions - however easy your terms are to skirt. NYT has the right to remove any of their content at any time, and to bar others from hosting and profiting on the content.

>>shkkmo+(OP)
It’s disingenuous to frame using data to train a model as a “view,” of that data. The simple cases are the easy ones, if ChatGPT completely rips a NYT article then that’s obviously infringement; however, there’s an argument to be made that every part of the LLM training dataset is, in part, used in every output of that LLM.

I don’t know the solution, but I don’t like the idea that anything I post online that is openly viewable is automatically opted into being part of ML/AI training data, and I imagine that opinion would be amplified if my writing was a product which was being directly threatened by the very same models.

replies(1): >>bluefi+sb

>>rndmwl+A6
All I can ever think about with how ML models work is that they sound an awful lot like Data Laundering schemes.

You can get basically-but-not-quite-exactly the copyrighted material that it was trained on.

Saw this a lot with some earlier image models where you could type in an artists name and get their work back.

The fact that AI models are having to put up guardrails to prevent that sort of use is a good sign that they weren't trained ethically and they should be paying a ton of licensing fees to the people whose content they used without permission.

replies(1): >>logicc+iw

>>shkkmo+(OP)
It does exist, and you'd be glad to know that it's going in the pro-AI/training direction: https://www.reedsmith.com/en/perspectives/ai-in-entertainmen...

replies(1): >>hn_thr+YP3

>>bluefi+sb
>You can get basically-but-not-quite-exactly the copyrighted material that it was trained on.

You can do exactly the same with a human author or artist if you prompt them to. And if you decide to publish this material, you're the one liable for breach of copyright, not the person you instructed to create the material.

replies(1): >>asadot+Fi1

>>logicc+iw
Not if that person is a trillion dollar corporation. If they're a business that's regularly stealing content and re-writing it for their customers that business is gonna go down. Sure, a customer or two may go down with them but the business that sells counterfeit works to spec is not gonna last long.

>>Captai+Wd
> It does exist, and you'd be glad to know that it's going in the pro-AI/training direction

Certainly not in the US. From the article you linked "In the United States, in the absence of a TDM exception, AI companies contend that inclusion of copyrighted materials in training sets constitute fair use eg not copyright infringement, which position remains to be evaluated by the courts."

Fair use is a defense against copyright infringement, but the whole question in the first place is whether generative AI training falls under fair use, and this case looks to be the biggest test of that (among others filed relatively recently).