zlacker

[return to "The New York Times is suing OpenAI and Microsoft for copyright infringement"]
1. solard+Aj[view] [source] 2023-12-27 15:53:06
>>ssgodd+(OP)
I hope this results in Fair Use being expanded to cover AI training. This is way more important to humanity's future than any single media outlet. If the NYT goes under, a dozen similar outlets can replace them overnight. If we lose AI to stupid IP battles in its infancy, we end up handicapping probably the single most important development in human history just to protect some ancient newspaper. Then another country is going to do it anyway, and still the NYT is going to get eaten.
◧◩
2. aantix+1l[view] [source] 2023-12-27 16:01:23
>>solard+Aj
Why can't AI at least cite its source? This feels like a broader problem, nothing specific to the NYTimes.

Long term, if no one is given credit for their research, either the creators will start to wall off their content or not create at all. Both options would be sad.

A humane attribution comment from the AI could go a long way - "I think I read something about this <topic X> in the NYTimes <link> on January 3rd, 2021."

It appears that without attribution, long term, nothing moves forward.

AI loses access to the latest findings from humanity. And so does the public.

◧◩◪
3. apante+6m[view] [source] 2023-12-27 16:07:49
>>aantix+1l
A neural net is not a database where the original source is sitting somewhere in an obvious place with a reference. A neural net is a black box of functions that have been automatically fit to the training data. There is no way to know what sources have been memorized vs which have made their mark by affecting other types of functions in the neural net.
◧◩◪◨
4. dlandi+At[view] [source] 2023-12-27 16:47:17
>>apante+6m
> There is no way to know what sources have been memorized vs which have made their mark by affecting other types of functions in the neural net.

But if it's possible for the neural net to memorize passages of text then surely it could also memorize where it got those passages of text from. Perhaps not with today's exact models and technology, but if it was a requirement then someone would figure out a way to do it.

◧◩◪◨⬒
5. Tao330+cB[view] [source] 2023-12-27 17:30:36
>>dlandi+At
Neural nets don't memorize passages of text. They train on vectorized tokens. You get a model of how language statistically works, not understanding and memory.
◧◩◪◨⬒⬓
6. tsimio+AN[view] [source] 2023-12-27 18:38:07
>>Tao330+cB
The model weights clearly encode certain full passages of text, otherwise it would be virtually impossible for the network to produce verbatim copies of text. The format is something very vaguely like "the most likely token after "call" is "me"; the most likely token after "call me" is "Ishmael". It's ultimately a kind of lossy statistical compression scheme at some level.
◧◩◪◨⬒⬓⬔
7. photon+lQ[view] [source] 2023-12-27 18:53:01
>>tsimio+AN
> It's ultimately a kind of lossy statistical compression scheme at some level.

And on this subject, it seems worthwhile to note that compression has never freed anyone from copyright/piracy considerations before. If I record a movie with a cell phone at a worse quality, that doesn't change things. If a book is copied and stored in some gzipped format where I can only read a page at a time, or only read a random page at a time, I don't think that's suddenly fair-use.

Not saying these things are exactly the same as what LLMs do, but it's worth some thought, because how are we going to make consistent rules that apply in one case but not the other?

◧◩◪◨⬒⬓⬔⧯
8. seanmc+SQ[view] [source] 2023-12-27 18:55:24
>>photon+lQ
If you watch a bunch of movies then go on to make your own movie based on influence from these movies, you are protected even if you have mentally compressed them into your own movie. At some point, you can learn, be influenced and be inspired from copyrighted material (not copyright infringement), and at some point you are just making a poor copy of the material (definitely copyright infringement). LLMs are probably still at the latter case than the former, but eventually AI will reach the former case.
◧◩◪◨⬒⬓⬔⧯▣
9. photon+t11[view] [source] 2023-12-27 19:55:12
>>seanmc+SQ
There's no obvious need to hold people / AI to same standards here, yet, even if compression in mental-models is exactly analogous to compression in machine-models. I guess we decided already that corporations are already "like" persons legally, but the jury is still out on AIs. Perhaps people should be allowed more leeway to make possibly-questionable derivative works, because they have lives to live, and genuine if misguided creative urges, and bills to pay, etc. Obviously it's quite difficult to try and answer the exact point at which synthesis & summary cross a line to become "original content". But it seems to me that, if anything, machines should be held to higher standard than people.

Even if LLMs can't cite their influences with current technology, that can't be a free pass to continue things this way. Of course all data brokers resist efforts along the lines of data-lineage for themselves and they want to require it from others. Besides copyright, it's common for datasets to have all kinds of other legal encumbrances like "after paying for this dataset, you can do anything you want with it, excepting JOINs with this other dataset". Lineage is expensive and difficult but not impossible. Statements like "we're not doing data-lineage and wish we didn't have to" are always more about business operations and desired profit margins than technical feasibility.

◧◩◪◨⬒⬓⬔⧯▣▦
10. seanmc+Ok1[view] [source] 2023-12-27 21:39:00
>>photon+t11
> But it seems to me that, if anything, machines should be held to higher standard than people.

If machines achieve sentience, does this still hold? Like, we have to license material for our sentient AI to learn from? They can't just watch a movie or read a book like a normal human could without having the ability to more easily have that material influence new derived works (unlike say Eragon, which is shamelessly Star Wars/Harry Potter/LOTR with dragons).

It will be fun to trip through these questions over the next 20 years.

[go to top]