I personally think that giving copyright holders control over who is legally allowed to view a work that has been made publicly available is a huge step in the wrong direction. One of those reasons is open source, but really that argument applies just as well to making sure that smaller companies have a chance of competing.
I think it makes much more sense to go after the infringing uses of models rather than putting in another barrier that will further advantage the big players in this space.
I don’t know the solution, but I don’t like the idea that anything I post online that is openly viewable is automatically opted into being part of ML/AI training data, and I imagine that opinion would be amplified if my writing was a product which was being directly threatened by the very same models.
You can get basically-but-not-quite-exactly the copyrighted material that it was trained on.
Saw this a lot with some earlier image models where you could type in an artists name and get their work back.
The fact that AI models are having to put up guardrails to prevent that sort of use is a good sign that they weren't trained ethically and they should be paying a ton of licensing fees to the people whose content they used without permission.
You can do exactly the same with a human author or artist if you prompt them to. And if you decide to publish this material, you're the one liable for breach of copyright, not the person you instructed to create the material.
Certainly not in the US. From the article you linked "In the United States, in the absence of a TDM exception, AI companies contend that inclusion of copyrighted materials in training sets constitute fair use eg not copyright infringement, which position remains to be evaluated by the courts."
Fair use is a defense against copyright infringement, but the whole question in the first place is whether generative AI training falls under fair use, and this case looks to be the biggest test of that (among others filed relatively recently).