zlacker

It doesn't matter what's good for open source ML.

It matters what is legal and what makes sense.

replies(6): >>hacker+M >>bbkane+Z >>z7+i1 >>shkkmo+D2 >>soulof+E3 >>hacker+sg

>>onlyre+(OP)
Clearly if a law is bad then we should change that law. The law is supposed to serve humanity and when it fails to do so it needs to change.

>>onlyre+(OP)
It matters what ends up being best for humanity, and I think there are cases to be made both ways on this

replies(1): >>joquar+za

>>onlyre+(OP)
Slavery was legal...

replies(1): >>belter+e2

>>z7+i1
Still is in many countries with excellent diplomatic relations with the Western World:

https://www.cfr.org/backgrounder/what-kafala-system

>>onlyre+(OP)
The law on this does not currently exist. It is in the process of being created by the courts and legistatures.

I personally think that giving copyright holders control over who is legally allowed to view a work that has been made publicly available is a huge step in the wrong direction. One of those reasons is open source, but really that argument applies just as well to making sure that smaller companies have a chance of competing.

I think it makes much more sense to go after the infringing uses of models rather than putting in another barrier that will further advantage the big players in this space.

replies(3): >>jrajav+n4 >>rndmwl+d9 >>Captai+zg

>>onlyre+(OP)
It doesn't matter what is legal. It matters what is right. Society is about balancing the needs of the individual vs the collective. I have a hard time equating individual rights with the NYT and I know my general views on scraping public data and who I was rooting for in the LinkedIn case.

replies(2): >>notaha+sj >>jeremy+eI

>>shkkmo+D2
Copyright holders already have control over who is legally allowed to view a work that has been made publicly available. It's the right to distribution. You don't waive that right when you make your content free to view on a trial basis to visitors to your site, with the intent of getting subscriptions - however easy your terms are to skirt. NYT has the right to remove any of their content at any time, and to bar others from hosting and profiting on the content.

>>shkkmo+D2
It’s disingenuous to frame using data to train a model as a “view,” of that data. The simple cases are the easy ones, if ChatGPT completely rips a NYT article then that’s obviously infringement; however, there’s an argument to be made that every part of the LLM training dataset is, in part, used in every output of that LLM.

I don’t know the solution, but I don’t like the idea that anything I post online that is openly viewable is automatically opted into being part of ML/AI training data, and I imagine that opinion would be amplified if my writing was a product which was being directly threatened by the very same models.

replies(1): >>bluefi+5e

>>bbkane+Z
People often get buried in the weeds about the purpose of copyright. Let us not forget that the only reason copyright laws exist is

> To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries

If copyright is starting to impede rather than promote progress, then it needs to change to remain constitutional.

replies(3): >>tbrown+Fg >>nsagen+zo >>gosub1+dv

>>rndmwl+d9
All I can ever think about with how ML models work is that they sound an awful lot like Data Laundering schemes.

You can get basically-but-not-quite-exactly the copyrighted material that it was trained on.

Saw this a lot with some earlier image models where you could type in an artists name and get their work back.

The fact that AI models are having to put up guardrails to prevent that sort of use is a good sign that they weren't trained ethically and they should be paying a ton of licensing fees to the people whose content they used without permission.

replies(1): >>logicc+Vy

>>onlyre+(OP)
setting legality as a cornerstone of ethics is a very slippery slope :)

>>shkkmo+D2
It does exist, and you'd be glad to know that it's going in the pro-AI/training direction: https://www.reedsmith.com/en/perspectives/ai-in-entertainmen...

replies(1): >>hn_thr+BS3

>>joquar+za
Do other countries all use the same reasoning?

replies(1): >>MetaWh+5j

>>tbrown+Fg
I don't think this was your point, but no they don't. Specifically China. What will happen if China has unbridled training for a decade while the United States quibbles about copyright?

I think publications should be protected enough to keep them in business, so I don't really know what to make of this situation.

>>soulof+E3
I have an even harder time equating individual rights with the spending of $xx billion in Azure compute time and payment of a collective $0 to millions of individuals who involuntarily contribute training material to create a closed source, commercial service allowing a single company to compete with all the individuals currently employed to create similar work.

NYT just happens to be an entity that can afford to fight Microsoft in court.

replies(1): >>hacker+Tu

>>joquar+za
The reason copyright promotes progress is that it incentives individuals and organizations to release works publicly, knowing their works are protected against unlawful copying.

The end game when large content producers like The New York Times are squeezed due to copyright not being enforced is that they will become more draconian in their DRM measures. If you don't like paywalls now, watch out for what happens if a free-for-all is allowed for model training on copyrighted works without monetary compensation.

I had a similar conversation with my brother-in-law who's an economist by training, but now works in data science. Initially he was in the side of OpenAI, said that model training data is fair game. After probing him, he came to the same conclusion I describe: not enforcing copyright for model training data will just result in a tightening of free access to data.

We're already seeing it from the likes of Twitter/X and Reddit. That trend is likely to spread to more content-rich companies and get even more draconian as time goes on.

replies(1): >>malwra+kY1

>>notaha+sj
I don't see a problem as long as there's taxation.

Look at SpaceX. They paid a collective $0 to the individuals who discovered all the physics and engineering knowledge. Without that knowledge they're nothing. But still, aren't we all glad that SpaceX exists?

In exchange for all the knowledge that SpaceX is privatizing, we get to tax them. "You took from us, so we get to take it back with tax."

I think the more important consideration isn't fairness it's prosperity. I don't want to ruin the gravy train with IP and copyright law. Let them take everything, then tax the end output in order to correct the balance and make things right.

>>joquar+za
Copyright isn't what got in the way here. AI could have negotiated a license agreement with the rights holder. But they chose not to.

replies(1): >>logicc+Ez

>>bluefi+5e
>You can get basically-but-not-quite-exactly the copyrighted material that it was trained on.

You can do exactly the same with a human author or artist if you prompt them to. And if you decide to publish this material, you're the one liable for breach of copyright, not the person you instructed to create the material.

replies(1): >>asadot+il1

>>gosub1+dv
From their perspective they're training a giant mechanical brain. A human brain doesn't need any special license agreement to read and learn from a publicly available book or web page, why should a silicon one? They probably didn't even consider the possibility that people'd claim that merely having an LLM read copyrighted data was a copyright violation.

replies(1): >>gosub1+3C

>>logicc+Ez
I was thinking about this argument too: is it a "license violation" to gift a young adult a NYT subscription to help them learn to read? Or someone learning English as second language? That seems to be a strong argument.

But it falls apart because kids aren't business units trained to maximize shareholder returns (maybe in the farming age they were). OpenAI isn't open, it's making revolutionary tools that are absolutely going to be monetized by the highest bidder. A quick way to test this is NYT offers to drop their case if "open" AI "open"-ly releases all its code and training data, they're just learning right? what's the harm?

>>soulof+E3
When we're discussing litigation, it certainly matters what is legal.

replies(1): >>onlyre+NR1

>>logicc+Vy
Not if that person is a trillion dollar corporation. If they're a business that's regularly stealing content and re-writing it for their customers that business is gonna go down. Sure, a customer or two may go down with them but the business that sells counterfeit works to spec is not gonna last long.

>>jeremy+eI
And also - if what is legal isn't right, we live in a democracy and should change that.

Saying what's legal is irrelevant is an odd take.

I like living in a place with a rule of law.

replies(1): >>soulof+Hj4

>>nsagen+zo
I doubt there’s much that technical controls can do to limit the spread of NYT content, their only real recourse is to try suing unauthorized distributors. You only need to copy something once for it to be free.

>>Captai+zg
> It does exist, and you'd be glad to know that it's going in the pro-AI/training direction

Certainly not in the US. From the article you linked "In the United States, in the absence of a TDM exception, AI companies contend that inclusion of copyrighted materials in training sets constitute fair use eg not copyright infringement, which position remains to be evaluated by the courts."

Fair use is a defense against copyright infringement, but the whole question in the first place is whether generative AI training falls under fair use, and this case looks to be the biggest test of that (among others filed relatively recently).

>>onlyre+NR1
Should Harriet Tubman have petitioned her local city council and waited for a referendum before freeing slaves?

replies(1): >>onlyre+MM4

>>soulof+Hj4
Time will tell if comparing slavery to copyright is ridiculous or not.

In the case of slavery - we changed the law.

In the case of copyright - it's older than the Atlantic Slave Trade and still alive and kicking.

It's almost as if one of them is not like the other.

replies(1): >>soulof+OD5

>>onlyre+MM4
> It's almost as if one of them is not like the other.

Use this newfound insight to take my comment in good faith, as per HN guidelines, and recognize that I am making a generalized analogy about the gap between law and ethics, and not making a direct comparison between copyright and slavery.

Can we get back on topic?