zlacker

[parent] [thread] 16 comments
1. hhsect+(OP)[view] [source] 2023-12-27 20:42:39
I'm not for or against anything at this point until someone gets their balls out and clearly defines what copyright infringement means in this context.

If you give a bunch of books to a kid all by the same author and then pay that kid to write a book in a similar style and then I go on to sell that book...have I somehow infringed copyright?

The kids book at best is likely to be a very convincing facsimile of the original authors work...but not the authors work.

It seems to me that the only solution for artists is to charge for access to their work in a secure environment then lobotomise people on the way out.

The endgame seems to be "you can view and enjoy our work, but if you want to learn or be inspired by it, thats not on"

replies(7): >>flexth+V3 >>graphe+r4 >>twoodf+za >>soerxp+Nk >>incang+Dq >>OOPMan+Lu >>sulric+nV
2. flexth+V3[view] [source] 2023-12-27 21:05:33
>>hhsect+(OP)
I think you’re skipping over the problem.

In your example you owned the work you gave to the person to create derivatives of.

In a more accurate example you would be stealing those books and then giving them to someone else to create derivatives.

replies(1): >>slyall+26
3. graphe+r4[view] [source] 2023-12-27 21:09:00
>>hhsect+(OP)
Ironically these artists cant claim to be wholly original as they were certainly inspired. Artists that play live already "lobotomize" people on their way out since it's not easy to recreate an experience and a video isn't the same if it's a good show.

Artists that make easily reproducible art will circulate as these always have along with AI in a sea of other jpgs.

◧◩
4. slyall+26[view] [source] [discussion] 2023-12-27 21:18:40
>>flexth+V3
How about if I borrowed them from the library and gave them to the kid to read?

How about if I got the kid to read the books on a public website where the author made the books available for free?

5. twoodf+za[view] [source] 2023-12-27 21:45:30
>>hhsect+(OP)
There are two problems with the “kid” analogy:

a) In many closely comparable scenarios, yes, it’s copyright infringement. When Francis Ford Coppola made The Godfather film, he couldn’t just be “inspired” by Puzo’s book. If the story or characters or dialog are similar enough, he has to pay Puzo, even if the work he created was quite different and not a literal “copy”.

b) Training an LLM isn’t like giving someone a book. Among other things, it involves making a derivative copy into GPU memory. This copy is not a transitory copy in service of a fair use, nor likely a fair use in itself, nor licensed by the rights-holder.

replies(5): >>andy99+Fe >>PaulDa+lx >>random+dO >>EarthM+tQ >>fennec+5v2
◧◩
6. andy99+Fe[view] [source] [discussion] 2023-12-27 22:07:59
>>twoodf+za
> This copy is not a transitory copy in service of a fair use

Training is almost certainly fair use, so it's exactly a transitory copy in service of fair use. Training, other than the brief "transitory copy" you mention is not copying, it's making a minuscule algorithmic adjustment based on fleeting exposure to the data.

replies(2): >>twoodf+Aw >>edwint+Vo1
7. soerxp+Nk[view] [source] 2023-12-27 22:45:27
>>hhsect+(OP)
I don't have a comment on your hypothetical, but this case seems to go far beyond that. If you read the actual filing at the bottom of the linked page, NYT provides examples where ChatGPT recited exact multi-paragraph sections of their articles and tried to pass it off as its own words. Plainly reproducing a work is pretty much the only situation where "is this copyright violation?" isn't really in flux. It's not dissimilar to selling PDFs of copywritten books.

If NYT were fully rellying on the argument that training a model in wordcraft using their materials is always copyright violation, or only had short quotes to point to, the philosophical debate you're trying to have would be more relevant.

8. incang+Dq[view] [source] 2023-12-27 23:27:10
>>hhsect+(OP)
Importantly, the kid- an individual human- got some wealth somewhat proportional to their effort. There’s non-trivial effort in recruiting the kid. We can’t clone the kid’s brain a million times and run it for pennies.

There are differences that are ethically, politically and in other ways between an AI doing something and a human doing the exact same thing. Those differences may need reflecting in new laws.

IANAL ans don’t have any positive suggestions for good laws, just pointing out that the analogy doesn’t quite hold. I think we’re in new territory where analogies to previous human activities aren’t always productive.

9. OOPMan+Lu[view] [source] 2023-12-28 00:01:09
>>hhsect+(OP)
I think your kid analogy is flawed because it ignores the fact that you couldn't reasonably use said "kid" to rapidly produce thousands of works in the same style and then go on to use them to flood the market and drown out the original authors presence.

Try this with a real "kid" and you'll run into all kids of real-world constraints whereas flooding the world with derivative drivel using LLMs is something that's actually possible.

So yeah, stop using weak analogies, it's not helpful or intelligent.

◧◩◪
10. twoodf+Aw[view] [source] [discussion] 2023-12-28 00:20:22
>>andy99+Fe
Why is training “almost certainly” fair use?

Congress took the circuit holding in MAI Systems seriously enough to carve out a new fair use exception for copying software—entirely within the memory system of a licensed user—in service of debugging it.

If it took an act of Congress to make “unlicensed” debugging a fair use copy…

◧◩
11. PaulDa+lx[view] [source] [discussion] 2023-12-28 00:29:20
>>twoodf+za
Regarding (b) ... while a specific method of training that involved persistent copying may indeed be a violation, it is far from clear that the general notion of "send server request for URL, digest response in software that is not a browser" is automatically a violation. If there is deemed to be a difference (i.e. all you are allowed to do without a license is have a human read it in a browser), then one can see training mechanisms changing to accomodate that.
replies(1): >>twoodf+xx
◧◩◪
12. twoodf+xx[view] [source] [discussion] 2023-12-28 00:31:28
>>PaulDa+lx
It’s all about the purpose the transitory copy serves. The mechanism doesn’t really matter, so you can’t make categorical claims about (say) non-browser requests.
◧◩
13. random+dO[view] [source] [discussion] 2023-12-28 03:17:06
>>twoodf+za
>This copy is not a transitory copy in service of a fair use, nor likely a fair use in itself,

Seems vastly transitory and since the output cannot be copyrighted, does no harm to any work it “trained” on.

◧◩
14. EarthM+tQ[view] [source] [discussion] 2023-12-28 03:40:20
>>twoodf+za
> If the story or characters or dialog are similar enough, he has to pay Puzo, even if the work he created was quite different and not a literal “copy”.

I don't think that you can copyright a plot or story in any country can you?

If he re-wrote the story with different characters and different lines he wouldn't have had to to pay Puzo. I'm sure it would have been frowned upon if its too close, but legally ok.

15. sulric+nV[view] [source] 2023-12-28 04:28:48
>>hhsect+(OP)
you might be well served by reading the actual complaint.
◧◩◪
16. edwint+Vo1[view] [source] [discussion] 2023-12-28 10:04:40
>>andy99+Fe
If you overtrain the model may include verbatim copies of your training material, and may be able to produce verbatim copies of the original in its output.

If Microsoft truly believes that the trained output doesn't violate copyright then it should be forced to prove that by training it on all its internal source code, including Windows.

◧◩
17. fennec+5v2[view] [source] [discussion] 2023-12-28 17:46:39
>>twoodf+za
How is it a copy at all? Surely the model weights would therefore be much larger than the corpus of training data, which is not the case at all.

If it disgorges parts of NYT articles, how do we know this is not a common phrase, or the article isn't referenced verbatim on another, unpaid site?

I agree that if it uses the whole content of their articles for training, then NYT should get paid, but I'm not sure that they specifically trained on "paid NYT articles" as a topic, though I'm happy to be corrected.

I also think that companies and authors extremely overvalue the tiny fragments of their work in the huge pool of training data, I think there's a bit of a "main character" vibe going on.

[go to top]