zlacker

I don't think they're looking to prevent the inevitable, but rather see a target with a fat wallet from which a lot of money can be extracted. I'm not saying this in a negative way, but much of the "this is outrageous!" reaction to AI hasn't been about the building of models, but rather the realization that a few players are arguably getting very rich on those models so other people want their piece of the action.

replies(2): >>dissid+J1 >>haluka+c5

>>llm_ne+(OP)
If NYT wins this, then there is going to be a massive push for payouts from basically everyone ever…I don’t see that wallet being fat for long.

replies(3): >>noitpm+k2 >>throw_+V2 >>alexey+l4

>>dissid+J1
If they are determined to have broken the law then they should absolute be made to pay damages to aggrieved parties (now, determining if they did and who those parties are is an entirely unknown can of worms)

>>dissid+J1
The data will have to become more curated. Exclusivity deals will probably become a thing too. Good data will be worth the money and hassle; garbage (or meh) data won't.

>>dissid+J1
If LLMs actually create added value and don't just burn VC money then they should be able to pay a fair price for the work of people they're relying upon.

If your business is profitable only when you get your raw materials for free it's not a very good business.

replies(4): >>logicc+L4 >>hhjink+x5 >>bnralt+j9 >>015a+Xp

>>alexey+l4
By that logic you should have to pay the copyright holder of every library book you ever read, because you could later produce some content you memorised verbatim.

replies(6): >>passwo+46 >>nullin+i6 >>macNch+f7 >>Wiggly+S8 >>alexey+Ko >>015a+zr

>>llm_ne+(OP)
If this is inevitable (and I'm not saying it's not), who will produce high quality news content?

replies(1): >>hcurti+s7

>>alexey+l4
What is a fair price? The entire NYT library would be a fraction of a fraction of the training set (presumably).

replies(1): >>morkal+B7

>>logicc+L4
> the copyright holder of every library book

gets paid

>>logicc+L4
Copyright holders do get paid for library copies, in the US.

replies(1): >>exitb+zb

>>logicc+L4
The rules we have now were made in the context of human brains doing the learning from copyrighted material, not machine learning models. The limitations on what most humans can memorize and reproduce verbatim are extraordinarily different from an LLM. I think it only makes sense to re-explore these topics from a legal point of view given we’ve introduced something totally new.

replies(1): >>whichf+BA

>>haluka+c5
AI. And, I fear, it will be good.

replies(1): >>epc+6a

>>hhjink+x5
What if even though it's a small portion of the training data, their content has an outsized influence on the output being generated? A random NYT article about Donald Trump and a random Wikipedia article about some obscure nematode might be around the same share of training data but if 10,000x more users are asking about DJT than the nematode, what is fair? Obviously they'll need to pay royalties on the usage! /s

>>logicc+L4
The difference here is scale. For someone to reproduce a book verbatim from memory it would take years of studying that book. For an LLM this would take seconds.

The LLM could reproduce the whole library quicker than a person could reproduce a single book.

>>alexey+l4
Imagine if tomorrow it was decided that every programmer had to pay out money for every single thing they went on the internet to learn about beyond official documentation, every Stack Overflow question they looked at, every question they went to a search engine to find. The amount of money was decided by a non-tech official who was in charge of figuring out how much of the money they earned was owed to the places they learned from. And people responded, "Well, if you can't pay up for your raw materials, then this just isn't a good business for you."

replies(1): >>frakt0+ma

>>hcurti+s7
Curious how AI gets the raw information if there are no reporters nor newspapers. Does AI go to meetings or interview politicians?

replies(1): >>hcurti+3c

>>bnralt+j9
Except that every stackoverflow post is explicitly creative commons: https://stackoverflow.com/help/licensing

replies(1): >>bnralt+jd

>>nullin+i6
You make it seem as if the copyright holder is making more money on a library book, than on one sold in retail, which does not appear to be the case in the US.

replies(1): >>willse+rj

>>epc+6a
I can certainly imagine email correspondence. Even audio interviews. You're right that it seems at least presently AI is less likely to earn confidences. But I don't know how far off the movie "Her" actually is.

>>frakt0+ma
So I suppose it would be the like saying that if you used Stack Overflow to find answers, all of the work you created using information from it would have to be explicitly under the Creative Commons license. You wouldn't even be able to work for companies who aren't using that license if some of your knowledge comes from what you learned on Stack Overflow. Used Stack Overflow to learn anything about programming? You're going to have to turn down that FAANG offer.

And if you learned anything from videos/books/newsletters with commercial licenses, you would have to pay some sort of fee for using that information.

replies(1): >>jakein+Kk

>>exitb+zb
The library pays for the books and the copyright holder gets paid. This is no different from buying a book retail, which you can read and share with family and friends after reading, or sell it, where it can be read again and sold again. The book is the product, not a license for one person to access the book.

>>bnralt+jd
If your code contains verbatim copy-paste of entire blocks of non-trivial code lifted from those videos/books/newsletters with commercial licenses, then yes you would be liable for some licensing fees, at minimum.

>>logicc+L4
That is the case. It's just that the fair price is fairly low and is often covered by the government in the name of the greater good.

When for-profit companies seek access to library material they pay a much much higher price.

>>alexey+l4
Yup, and I think that'll quickly uncover the reality that LLMs do not generate enough value relative to their true cost. GPT+ already costs $20/month. M365 Copilot costs $30/user/month. They're already the most expensive B2B-ish software subscriptions out there, there's very little market room to add in more cost to cover payments to rightsholders.

>>logicc+L4
What do you actually believe, with that statement? Do you believe Libraries are operating illegally? That they aren't paying rightsholders?

Also: GPT is not a legal entity in the united states. Humans have different rights than computer software. You are legally allowed to borrow books from the library. You are legally allowed to recite the content you read. You're not allowed to sell verbatim recitation of what you read. This is, obvious, I think? But its exactly what LLMs are doing right now.

replies(1): >>stale2+7K

>>macNch+f7
Human brains are still the main legal agents in play. LLMs are just a computer programs used by humans.

Suppose I research for a book that I'm writing - it doesn't matter whether I type it on a Mac, PC, or typewriter. It doesn't matter if I use the internet or the library. It doesn't matter if I use an AI powered voice-to-text keyboard or an AI assistant.

If I release a book that has a chapter which was blatantly copied from another book, I might be sued under copyright law. That doesn't mean that we should lock me out of the library, or prevent my tools from working there.

replies(2): >>macNch+VG >>015a+l22

>>whichf+BA
I see two separate issues, the one you describe which is maybe slightly more clear cut: if a person uses an AI trained on copyrighted works as a tool to create and publish their own works, they are responsible if those resulting works infringe.

The other question, which I think is more topical to this lawsuit, is whether the company that trains and publishes the model itself is infringing, given they're making available something that is able to reproduce near-verbatim copyrighted works, even if they themselves have not directly asked the model to reproduce them.

I certainly don't have the answers, but I also don't think that simplistic arguments that the cat is already out of the bag or that AIs are analogous to humans learning from books are especially helpful, so I think it's valid and useful for these kinds of questions to be given careful legal consideration.

>>015a+zr
> Humans have different rights than computer software

Fortunately, the computer isn't the one being sued.

Instead it is the humans who use the computer. And those humans maintain their existing rights, even if they use a computer.

replies(1): >>015a+Y12

>>stale2+7K
Maybe (though there exist plenty of examples to the contrary). However, the NYT isn't suing you, ChatGPT user; they're suing OpenAI.

replies(1): >>stale2+pa2

>>whichf+BA
> Human brains are still the main legal agents in play.

No, they're not. This is The New York Times (a corporation) vs OpenAI and Microsoft (two more corporations).

replies(1): >>rajama+3f2

>>015a+Y12
Gotcha.

OpenAI is run by humans as well though.

So the same argument applies.

Those humans have fair use rights as well.

>>015a+l22
Aren't corporations considered 'persons' in the US?