zlacker

It boils down to this: Do you need permission if you train your AI model with copyrighted things or not?

replies(5): >>gt565k+V >>ben_w+r1 >>cardan+42 >>residu+V3 >>jefftk+X9

>>Tepix+(OP)
Ehhh that’s like saying an artist who studies other art pieces and then creates something using combined techniques and styles from those set pieces is what ???? Now liable ???

replies(6): >>Tepix+D1 >>Taywee+o3 >>TaupeR+i5 >>Double+Y7 >>alt227+7q >>bakugo+yq

>>Tepix+(OP)
If you do need permission, is Page Rank a copyright infringing AI, or just a sparkling matrix multiplication derived entirely from everyone else's work?

replies(2): >>people+e7 >>Lalaba+Z7

>>gt565k+V
That's like saying creating a thing that looks at one artists artwork and then copies her unique style ad infinitum may need permission first.

replies(2): >>pigsty+d2 >>Gigach+i3

>>Tepix+(OP)
As a human, I can use whatever I want for reference for my drawings. Including copyrighted material.

Now, as for training "AI" models, who knows. You can argue it is the same thing a human is doing or you could argue it a new, different quality and should be under different rules. Regardless, the current copyright laws were written before "AI" models were in widespread use so whatever is allowed or not is more of a historic accident.

So the discussion needs to be about the intention of copyright laws and what SHOULD be.

replies(2): >>vgathe+q3 >>people+Q7

>>Tepix+D1
Copying an artist’s style is very much not considered copyright infringement and is how artists learn.

Copying a work itself can be copyright infringement if it’s very close to the original to the point people may think they’re the same work.

>>Tepix+D1
You don’t need permission. Style is not an owned thing.

>>gt565k+V
An AI is not a person. Automated transformation does not remove the original copyright, otherwise decompilers would as well. That the process is similar to a real person is not actually important, because it's still an automated transformation by a computer program.

We might be able to argue that the computer program taking art as input and automatically generating art as output is the exact same as an artist some time after general intelligence is reached, until then, it's still a machine transformation and should be treated as such.

AI shouldn't be a legal avenue for copyright laundering.

replies(3): >>CyanBi+66 >>idleha+l9 >>jefftk+Aa

>>cardan+42
This would be a fairly novel law as it would legislate not just the release of an AI but the training as well? That would imply legislating what linear algebra is legal and illegal to do, no?

And practically speaking, putting aside whether a government should even be able to legislate such things, enforcing such a law would be near impossible without wild privacy violations.

replies(2): >>CyanBi+O6 >>manimi+V7

>>Tepix+(OP)
I would argue if people are allowed to see your art for free, so should AI models.

replies(3): >>people+C8 >>dotanc+uc >>bakugo+7r

>>gt565k+V
Not at all, for many reasons.

1) the artist is not literally copying the copyrighted pixel data into their "system" for training

2) An individual artist is not a multi billion dollar company with a computer system that spits out art rapidly using copyrighted pixel data. A categorical difference.

replies(3): >>brushf+m6 >>endorp+w6 >>astran+C6

>>Taywee+o3
Except the machine is not automatically generating an input

> automatically generating art as output

The user is navigating the latent space to obtain said output, I don't know if that's transformative or not, but it is an important distinction

If the program were wholy automated as in it had a random number/words generator added to it and no navigation of the latent space by users happened, then yeah I would agree, but that's not the case at least so far as ml algos like midjourney or stable diffusion are concerned

replies(3): >>Taywee+D7 >>Retric+L7 >>jamesd+n8

>>TaupeR+i5
Those reasons don't make sense to me.

On 1, human artists are copying copyrighted pixel data into their system for training. That system is the brain. It's organic RAM.

On 2, money shouldn't make a difference. Jim Carrey should still be allowed to paint even though he's rich.

If Jim uses Photoshop instead of brushes, he can spit out the style ideas he's copied and transformed in his brain more rapidly - but he should still be allowed to do it.

replies(3): >>astran+i9 >>Taywee+la >>Alexan+dp

>>TaupeR+i5
Have to disagree with point 1, often this is what artists are doing. More strictly in the music part (literally playing others songs), less strictly in the drawing part. But copying, incorporating and developing are some of the core foundations of art.

>>TaupeR+i5
Diffusion models don't copy the pixels you show them. You cannot generally tell which training images inspired which output images.

(That's as opposed to a large language model, which does memorize text.)

Also, you can train it to imitate an artist's style just by showing it textual descriptions of the style. It doesn't have to see any images.

replies(1): >>mejuto+0u

>>vgathe+q3
> That would imply legislating what linear algebra is legal and illegal to do, no?

No, it would just legislate what images are and which ones are not on the training data to be parsed, artists want a copyright which makes their images unusable for machine learning derivative works.

The trick here is that eventually the algorithms will get good enough that it won't be necessary for said images to even be on the training data in the first place, but we can imagine that artists would be OK with that

replies(2): >>astran+V6 >>stale2+rs1

>>CyanBi+O6
> The trick here is that eventually the algorithms will get good enough that it won't be necessary for said images to even be on the training data in the first place, but we can imagine that artists would be OK with that

They shouldn't be OK with that and they probably aren't. That's a much worse problem for them!

The reason they're complaining about copyright is most likely coping because this is what they're actually concerned about.

>>ben_w+r1
Page Rank doesn't reproduce any content claiming it's new.

You can however disallow Google from indexing your content using robots.txt a met tag in the HTML or an HTTP header.

Or you can ask Google to remove it from their indexes.

Your content will disappear from then on.

You can't un-train what's already been trained.

You can't disallow scraping for training.

The damage is already done and it's irreversible.

It's like trying to unbomb Hiroshima.

replies(1): >>CyanBi+q8

>>CyanBi+66
That's still automated in the same way that a compiler is automated. A compiler doesn't remove the copyright, neither does a decompiler. This isn't different enough to have different copyright rules. There are more layers to the transformation, but it's still a program with input and output. I'm not sure what you mean by "navigation of latent space". It's generating a model from copyrighted input and then using that model and more input to generate output. It's a machine transformation in more steps.

>>CyanBi+66
The output is probably irrelevant here, the model itself is a derivative work from a copyright standpoint.

Going painting > raw photo (derivative work), raw photo > jpg (derivative work), jpg > model (derivative work), model > image (derivative work). At best you can make a fair use argument at that last step, but that falls apart if the resulting images harm the market for the original work.

replies(2): >>Peteri+dm >>strken+ov

>>cardan+42
> As a human

you have rights.

AIs don't.

Because they don't have will.

It's like arresting a gun for killing people.

So, as a human, the individual(s) training the AI or using the AI to reproduce copyrighted material, are responsible for the copyright infringement, unless explicitly authorized by the author(s).

>>vgathe+q3
I am not allowed to print $100 bills with my general-purpose printer. Many printing and copy machines come with built-in safeguards to prevent users from even trying.

It's quite possible to apply the same kind of protections to generative models. (I hope this does not happen, but it is fully possible.)

replies(1): >>bootsm+If

>>gt565k+V
That's still the question that it boils down to, even if the answer is a "No".

>>ben_w+r1
The output of Pagerank for a given page is not another new page, that's curiously close in style and execution but laundered of IP concerns.

A tool that catalogues attributed links can't really be evaluated the same way as pastiche machine.

You'd be much closer using the example of Google's first page answer snippets, that are pulled out of a site's content with minimal attribution.

>>CyanBi+66
finally, a good use for a blockchain, decentralized defeating of copyright

>>people+e7
That's actually interesting, adding Metadata to the images as a check for allowing or disallowing ai usage

That might be a good way to go about it

replies(1): >>ben_w+gh

>>residu+V3
people are allowed to take a walk in the park, so why cars or tanks or bulldozers are not?

replies(1): >>residu+1b

>>brushf+m6
> On 1, human artists are copying copyrighted pixel data into their system for training. That system is the brain. It's organic RAM.

They probably aren't doing that. Studying the production methods and WIPs is more useful for a human. (ML models basically guess how to make images until they produce one that "looks like" something you show it.)

replies(1): >>Mezzie+Wg1

>>Taywee+o3
Now we are in Ship of Theseus territory. If I downsample an image and convert it into a tiny delta in the model weights, from which the original image can never be recovered, is that infringement?

>>Tepix+(OP)
Which is also what the GitHub co-pilot suit is about: https://githubcopilotlitigation.com

If you have views on whether they'll win, the prediction market is currently at 49%: https://manifold.markets/JeffKaufman/will-the-github-copilot...

>>brushf+m6
A human can grow and learn based on their own experiences separate from their art image input. They'll sometimes get creative and develop their own unique style. Through all analogies, the AI is still a program with input and output. Point 1 doesn't fit for the same reason it doesn't work for any compiler. Until AI can innovate itself and hold its own copyright, it's still a machine transformation.

>>Taywee+o3
> Automated transformation does not remove the original copyright

Automated transformation is not guaranteed to remove the original copyright, and for simple transformations it won't, but it's an open question (no legal precedent, different lawyers interpreting the law differently) whether what these models are doing is so transformative that their output (when used normally, not trying to reproduce a specific input image) passes the fair use criteria.

>>people+C8
A bulldozer destroys the park and other people's ability to enjoy it -- active, destructive. Passively training a model on an artwork does not change the art in the slightest -- passive, non-destructive

Mind you, this is not talking about the usage rights of images generated from such a model, that's a completely different story and a legal one.

replies(1): >>6P58r3+gd

>>residu+V3
Bad argument. Being allowed to see art and being allowed to copy art are two different things. Being allowed to _copy_ is a reserved _right_, that's the root of the word copyright.

replies(2): >>concor+6f >>Curiou+JL

>>residu+1b
> A bulldozer destroys the park and other people's ability to enjoy it

hear hear...

> Passively training a model on an artwork does not change the art in the slightest

copyright holders, I mean individual authors, people who actually produced the content being used, disagree.

They say AI is like a bulldozer destroying the park to them.

Which technically is true, it's a machine that someone (some interested party maybe?) is trying to disguise as a human, doing human stuff.

But it's not.

> passive, non-destructive

Passive, non-destructive, in this context means

- passive: people send the images to you, you don't go looking for them

- non-destructive: people authorized you, otherwise it's destructive of their rights.

>>dotanc+uc
Except they aren't copying it, but instead drawing inspiration from it. Which all humans have done forever.

replies(2): >>Alexan+Zq >>mejuto+gv

>>manimi+V7
Entirely different scales apply here. You can hardcode a printer the 7 different bills each country puts out no problem, but you cannot hardcode the billions of "original" art pieces that the model is supposed to check against during training, its just infeasible.

replies(1): >>Curiou+JK

>>CyanBi+q8
If you can make the metadata survive cropping, format shifts, and screenshots.

Can probably do all that well-enough (probably doesn't need to be perfect) by leaning on FAANG, with or without legislation.

But: opt-in by default, or opt-out by default?

>>Retric+L7
It's not clear at all whether the model is a derivative work from a copyright standpoint. Maybe they are, may be they are not - it's definitely not settled, the law isn't very explicit and as far as I know, there is no reasonable precedent yet - and arguably that would be one of the key issues decided (and set as precedent) in these first court battles. I also wouldn't be surprised if it eventually doesn't matter what current law says as the major tech companies may lobby passing a law to explicitly define the rules of the game; I mean if Disney could lobby multiple copyright laws to protect their interests, then the ML-heavy tech companies, being much larger and more wealthy than Disney, can do it as well.

But currently, first, there is a reasonable argument that the model weights may be not copyrightable at all - it doesn't really fit the criteria of what copyright law protects, no creativity was used in making them, etc, in which case it can't be a derivative work and is effectively outside the scope of copyright law. Second, there is a reasonable argument that the model is a collection of facts about copyrighted works, equivalent to early (pre-computer) statistical ngram language models of copyrighted books used in e.g. lexicography - for which we have solid old legal precedent that creating such models are not derivative works (again, as a collection of facts isn't copyrightable) and thus can be done against the wishes of the authors.

Fair use criteria comes into play as conditions when it is permissible to violate the exclusive rights of the authors. However, if the model is not legally considered a derivative work according to copyright law criteria, then fair use conditions don't matter because in that case copyright law does not assert that making them is somehow restricted.

Note that in this case the resulting image might still be considered derivative work of an original image, even if the "tool-in-the-middle" is not derivative work.

replies(1): >>Retric+2L

>>brushf+m6
I think the parent's point about (2) wasn't about money, but category. A human is a human and has rights, an AI model is a tool and does not have rights. The two would not be treated equally under the law in any other circumstances, so why would you equate them when discussing copyright?

>>gt565k+V
Depends if the artist creates something new which looks exactly like one of the things he has studied.

>>gt565k+V
No, it's not the same thing at all, in fact it's entirely unrelated.

Say it with me: Computer algorithms are NOT people. They should NOT have the same rights as people.

>>concor+6f
This falls apart for 2 reasons. First, I don't think there's any technical definition of "inspiration" that applies to a deeply nested model of numerical weights. It's a machine. A hammer does not draw inspiration from nails that have been hammered in before. Second an AI is not a human under the law and there's no reason to think that an activity that would be considered "transformative" (e.g. learning then painting something similar) when done by a human would still be considered such if performed by an AI.

>>residu+V3
AI models are not people.

>>astran+C6
> Also, you can train it to imitate an artist's style just by showing it textual descriptions of the style. It doesn't have to see any images.

And the weights. The weights it has learned come originally from the images.

>>concor+6f
Following your logic: if AI is like humans why don't we tax its work?

replies(1): >>Curiou+6L

>>Retric+L7
The question for me is whether "jpg > model" is derivative or transformative. It's not clear it would be derivative.

replies(1): >>Retric+TK

>>bootsm+If
Not exactly true. Given an image, you can find the closest point in the latent space that image corresponds to. It is totally feasible to do this with every image in the training set, and if that point in the latent space is too close to the training image, just add it to a set of "disallowed" latent points. This wouldn't fly for local generation, as the process would take a long time and generate a multi gigabyte (maybe even terabyte) "disallowed" database, but for online image generators it's not insane.

>>strken+ov
You seem to be confused, transformative works are still derivative works. Being sufficiently transformative can allow for a fair use exception but you may need a court case to prove something is sufficiently transformative to qualify.

replies(1): >>strken+N82

>>Peteri+dm
You seem to be confused as to nomenclature, transformative works are still derivative works. Being sufficiently transformative can allow for a fair use exception, the distinction is important because you can’t tell if something is sufficiently transformative without a court case.

Also, a jpg seemingly fits your definition as “no creativity was used in making them, etc” but clearly they embody the original works creativity. Similarly, a model can’t be trained on random data it needs to extract information from it’s training data to be useful.

The specific choice of algorithm used to extract information doesn’t change if something is derivative.

>>mejuto+gv
If an AI ever gets paid for the work it does, I'm sure we will.

>>dotanc+uc
Bad argument. Copying art is not the crime, distributing the copied art is the crime. The Disney Gestapo can't send storm troopers to your house if your kid draws a perfect rendition of Mickey, but they can if your kid draws a bunch of perfect renditions and sells them online.

>>astran+i9
They do sometimes, or at least they used to. I have some (very limited) visual art training, and one of the things I/we did in class was manually mash up already existing works. In my case I smushed the Persistence of Memory and the Arnolfini portrait. It was pretty clear copycat; the work was divided into squares and I poorly replicated the Arnolfini Portrait from square to square.

>>CyanBi+O6
> but we can imagine that artists would be OK with that

No they won't. If AI art was just as good as it is today, but didn't use copyrighted images in the training set, people would absolutely still be finding some other thing to complain about.

Artists just don't want the tech to exist entirely.

>>Retric+TK
Sorry, yes.