It matters what is legal and what makes sense.
I personally think that giving copyright holders control over who is legally allowed to view a work that has been made publicly available is a huge step in the wrong direction. One of those reasons is open source, but really that argument applies just as well to making sure that smaller companies have a chance of competing.
I think it makes much more sense to go after the infringing uses of models rather than putting in another barrier that will further advantage the big players in this space.
I don’t know the solution, but I don’t like the idea that anything I post online that is openly viewable is automatically opted into being part of ML/AI training data, and I imagine that opinion would be amplified if my writing was a product which was being directly threatened by the very same models.
> To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries
If copyright is starting to impede rather than promote progress, then it needs to change to remain constitutional.
You can get basically-but-not-quite-exactly the copyrighted material that it was trained on.
Saw this a lot with some earlier image models where you could type in an artists name and get their work back.
The fact that AI models are having to put up guardrails to prevent that sort of use is a good sign that they weren't trained ethically and they should be paying a ton of licensing fees to the people whose content they used without permission.
I think publications should be protected enough to keep them in business, so I don't really know what to make of this situation.
NYT just happens to be an entity that can afford to fight Microsoft in court.
The end game when large content producers like The New York Times are squeezed due to copyright not being enforced is that they will become more draconian in their DRM measures. If you don't like paywalls now, watch out for what happens if a free-for-all is allowed for model training on copyrighted works without monetary compensation.
I had a similar conversation with my brother-in-law who's an economist by training, but now works in data science. Initially he was in the side of OpenAI, said that model training data is fair game. After probing him, he came to the same conclusion I describe: not enforcing copyright for model training data will just result in a tightening of free access to data.
We're already seeing it from the likes of Twitter/X and Reddit. That trend is likely to spread to more content-rich companies and get even more draconian as time goes on.
Look at SpaceX. They paid a collective $0 to the individuals who discovered all the physics and engineering knowledge. Without that knowledge they're nothing. But still, aren't we all glad that SpaceX exists?
In exchange for all the knowledge that SpaceX is privatizing, we get to tax them. "You took from us, so we get to take it back with tax."
I think the more important consideration isn't fairness it's prosperity. I don't want to ruin the gravy train with IP and copyright law. Let them take everything, then tax the end output in order to correct the balance and make things right.
You can do exactly the same with a human author or artist if you prompt them to. And if you decide to publish this material, you're the one liable for breach of copyright, not the person you instructed to create the material.
But it falls apart because kids aren't business units trained to maximize shareholder returns (maybe in the farming age they were). OpenAI isn't open, it's making revolutionary tools that are absolutely going to be monetized by the highest bidder. A quick way to test this is NYT offers to drop their case if "open" AI "open"-ly releases all its code and training data, they're just learning right? what's the harm?
Saying what's legal is irrelevant is an odd take.
I like living in a place with a rule of law.
Certainly not in the US. From the article you linked "In the United States, in the absence of a TDM exception, AI companies contend that inclusion of copyrighted materials in training sets constitute fair use eg not copyright infringement, which position remains to be evaluated by the courts."
Fair use is a defense against copyright infringement, but the whole question in the first place is whether generative AI training falls under fair use, and this case looks to be the biggest test of that (among others filed relatively recently).
In the case of slavery - we changed the law.
In the case of copyright - it's older than the Atlantic Slave Trade and still alive and kicking.
It's almost as if one of them is not like the other.
Use this newfound insight to take my comment in good faith, as per HN guidelines, and recognize that I am making a generalized analogy about the gap between law and ethics, and not making a direct comparison between copyright and slavery.
Can we get back on topic?