This is what you don't understand: the concept of fair use.
https://en.wikipedia.org/wiki/Fair_use
If the courts hold this type of thing to be fair use (which I'm about 90% sure they will), "consent" won't enter into it. At all.
Human artists already do this, extensively. We handle it by making their output the part of the process which holds relevant copyright protections. I can sell Picasso inspired pieces all day long as long as I don’t sell them as “Picasso.”
If I faithfully reproduced “The Old Guitarist”[1] and attempted to sell it as the original, or even as a version to copy and sell prints, I’d be open to legal claims and action. Rightfully so.
I personally haven’t heard a convincing argument as to why ML training should be handled as if it’s the output of the process, rather than the input that it is. I’m open to be swayed and make adjustments to my worldview so I keep looking for counterpoints.
I think we have given it plenty of time for such a discussion and the amount of events and actions happening around training on copyrighted works from images, songs and deepfakes for the lawsuits and licensing deals to happen and it all converging to paying for the data; hence OpenAI and may others doing so due to risks in such lawsuits.
> AI outputs should be regulated, of course. Obviously impersonation and copyright law already applies to AI systems. But a discussion on training inputs is entirely novel to man and our laws, and it's a very nuanced and important topic. And as AI advances, it becomes increasingly difficult because of the diminishing distinction between "organic" learning and "artificial" learning.
Copyright law does not care, nor is the overlying problem about using such a generative AI system for non-commercial uses such as for education or private use-cases. The line is being drawn as soon as it is commercialized and the fair use excuses fall apart. Even if the AI advances, so does the traceability methods and questions on the dataset being used. [0]
It costs musicians close to nothing to target and file lawsuits against commercial voice cloners. Not even training on copyrighted songs was an option for tools like DanceDiffusion [1] due to that same risk which is why training on public domain sounds audio was the safer alternative rather than run the risk of lawsuits and ask questions on the training set by tons of musicians.
[0] https://c2pa.org
[1] https://techcrunch.com/2023/09/13/stability-ai-gunning-for-a...
[0] https://www.theverge.com/2023/1/17/23558516/ai-art-copyright...
[1] https://variety.com/2023/digital/news/scarlett-johansson-leg...
No, AI art would exist without Disney or HBO just like human art would.
It literally does come back to the idea that either AI is doing more or less the same thing as an art student, and learns styles and structures and concepts, in which case training an art student is infringing because it’s completely dependent on the work of artists who came before.
And sure, if you ask a skilled 2d artist if they can draw something in the style of 80s anime, or specific artists, they can do it. There are some artists who specialize in this in fact! Can’t have retro anime porn commissions if it’s not riffing on retro anime images. Yes twitter, I see what you do with that account when you’re not complaining about AI.
The problem is that AI lowers the cost of doing this to zero, and thus lays bare the inherent contradictions of IP law and “intellectual ownership” in a society where everyone is diffusing and mashing up each others ideas and works on a continuous basis. It is one of those “everyone does it” crimes that mostly survives because it’s utterly unenforced at scale, apart from a few noxious litigants like disney.
It is the old Luddite problem - the common idea that luddites just hated technology is inaccurate. They were textile workers who were literally seeing their livelihoods displaced by automation mass-producing what they saw as inferior goods. https://en.wikipedia.org/wiki/Luddite
In general this is a problem that's set up by capitalism itself though. Ideas can’t and shouldn’t be owned, it is an absurd premise and you shouldn’t be surprised that you get absurd results. Making sure people can eat is not the job of capitalism, it’s the job of safety nets and governments. Ideas have no cost of replication and artificially creating one is distorting and destructive.
Would a neural net put a tax on neurons firing? No, that’s stupid and counterproductive.
Let people write their slash fiction in peace.
(HN probably has a good understanding of it, but in general people don't appreciate just how much it is not just aping images it's seen but learning the style and relationships of pixels and objects etc. To wit, the only thing NVIDIA saved from DLSS 1.0 was the model... and DLSS 2.0 has nothing to do with DLSS 1.0 in terms of technical approach. But the model encodes all the contextual understanding of how pixels are supposed to look in human images, even if it's not even doing the original transform anymore! And LLMs can indeed generalize reasonably accurately about things they haven't seen, as long as they know the precepts etc. Because they aren't "just guessing what word comes next", it's the word that comes next given a conceptual understanding of the underlying ideas. And that's a difficult thing to draw a line between a human and an AI large model, college students will "riff on the things they know" if you ask them to "generalize" about a topic they haven't studied too, etc.)
What rhetoric? I am telling the hard truth of it.
> Copyright law works fine with AI outputs. As does trademark law. I don't see an AI making a fanart Simpsons drawing being any more novel a legal problem than the myriad of humans that do it on YouTube already. Or people who sell handmade Pokemon plushies on Etsy without Nintendo's permission.
How is running the risk of a lawsuit being enforced by the copyright holder meaning that it is OK to continue selling the works? Again, if it parodies and fan-art are in a non-commercial setting, then it isn't a problem. The problems start when you get to the commercial setting which in the case of Nintendo is known to be extremely litigious even in similarity, AI or not. [0] [1] [2] Then the question becomes: 'How long until it get caught if I commercialize this?' for both the model's inputs OR outputs.
That question was answered in Getty's case: They didn't need to request Stability's training set, since it is publicly available. Nintendo and other companies can simply ask for the original training data of closed models if they wanted to.
> But the question is on inputs and how the carve-outs of "transformative" and "educational use" can be interpreted — model training may very well be considered education or research.
As with the above, this is why C2PA and traceability is in the works for those same reasons [3] to determine where the source of the generative digital works were derived from its output.
> I think it's been made rather clear that nobody has a real answer to this, copyright law didn't particularly desire to address if an artist is "stealing" when they borrow influence from other artists and use similar styles or themes (without consent) for their own career.
So that explains the scrambling actions of these AI companies to not address these issues or be transparent about their data set and training data. (Except for Stability) Since that is where it is going.
[0] https://www.vice.com/en/article/ae3bbp/the-pokmon-company-su...
[1] https://kotaku.com/pokemon-nintendo-china-tencent-netease-si...
[2] https://www.gameshub.com/news/news/pokemon-nft-game-pokeworl...
[3] https://variety.com/2023/digital/news/scarlett-johansson-leg...
Meanwhile, the lawsuit against Midjourney for training on copyrighted work is going... not that great[0]. The judge is paring down a lot of the arguments in the lawsuit.
The actual idea behind using copyright to stop AI is that if we give copyright owners of trained-on works the ability to veto that training, then we can just "stop AI". The problem is that most artists don't actually get to own their work. Publishers own the work, it's the first thing you have to bargain away in order to work with a publisher. So they're going to look at their vast dragon's horde of work, most of which isn't particularly profitable to them, and license it out to OpenAI, Stability, MidJourney, or whoever at pennies on the dollar because at their scale that becomes a pretty big deal.
To the publishers salivating over generative AI, this cost is not a big deal, because they already spend shittons on writers. So if your goal is to stop worker replacement, just adding a cost to that replacement isn't a good idea. Actually making it illegal or prohibited to actually replace workers with AI is the way to go.
[0] https://www.reuters.com/legal/litigation/judge-pares-down-ar...