Thus these arguments could backfire on those making them real quick. At this point though they have no choice but to make them as they’ve all clearly used copyrighted works to build their products.
The other challenge here is there needs to be some protection and compensation for folks producing original works in the first place. If we just end up with ML models training on the generated output of other ML models this is all going to go downhill real quick.
Some of these rhyme with the fair use and similar arguments put forth by the free software and anti-corporate-owned culture folks for the last couple decades. More honest (if cynical) is A16z’s take, of “the rich already put in a bunch of money, so now you can’t stop it.”
Stop making excuses. AI training on copyrighted works is straight wrong no matter how much you don't want it to be.
All of my internet comments are copyrighted btw, but I do offer a license of $1b usd/year for using them in a model if you'd like.
AlphaZero learned to play chess via self-play, not by reading books about chess.
Why couldn't the same happen for art for example?
For coding, won't a sufficiently advanced neural net be able to figure out how to use a programming language when given just the documentation?
And when most of our interactions are with AI, it will learn from our conversations with it. Asking some AI system why feature X was removed from programming language Y in version Z teaches it something. The next person who asks it which feature was removed from Y in version Z might be told "X". Without the AI system ever having to read about it. The interaction with AI could become a self-learning loop in and on itself.
A person can only service a few projects a year and it takes day or weeks on ingest new knowledge. A trained network only needs to train once and faster than a person, it can then be replicated and is only limited by computing power.
That skews the market and the value of the books. I would not write a book if it was then ingested by and AI and never sold to a person.
I'd like you do give away 100% of your salary, ok?
Are you greedy if you say no?
> Why couldn't the same happen for art for example?
> For coding, won't a sufficiently advanced neural net be able to figure out how to use a programming language when given just the documentation?
some domains are too complex and large to be cracked that way.
There are no principles involved when companies advocate for or against things. Companies will always amorally argue for whatever makes them more money. They are entirely capable of arguing two opposing viewpoints if in one context viewpoint A makes them money and in another context opposite-viewpoint B makes them money. Being consistent, either logically, morally, ethically, or in principle, is not necessary.
"Copyright is good and necessary when it makes us money, and copyright is bad and wrong when it doesn't make us money" is a mundane and totally expected opinion coming from a corporation.
If I do it with 100 books you will still get laughed out of the court room.
AI is only different because it can do the same to a million books.
It is no wonder why OpenAI had to pay Shutterstock for training on their data and Getty suing Stability AI for training on their watermarked images and using it commercially without permission and actors / actresses filing lawsuits against commercial voice cloners which costs them close to nothing, as those companies either take down the cloned voice offering or shutdown.
These weak arguments from these AI folks sound like excuses justifying a newly found grift.
If someone posts something to StackOverflow, they're intending to help both the original person and anyone that comes along later with the same question with their coding problem, and that's the extent of it.
An artist making a painting or song has not consented to training algorithms on their copyrighted work. In fact, neither has the StackOverflow person.
Boggles my mind this concept is so absent from the minds of SV folk.
AI outputs should be regulated, of course. Obviously impersonation and copyright law already applies to AI systems. But a discussion on training inputs is entirely novel to man and our laws, and it's a very nuanced and important topic. And as AI advances, it becomes increasingly difficult because of the diminishing distinction between "organic" learning and "artificial" learning. As well as when stopping AI from — as an example — learning from research papers means we miss out on life-saving medication. Where do property rights conflict with human rights?
They're important conversations to have, but you've destroyed the opportunity to have them from the starting gun.
The case against "tribute bands" is much stronger than the case against large language models built with some copyrighted content. Those are a blatant attempt to capitalize on the specific works of specific named people.
Now, the percentage of those jobs lost because some of the content was accidentally copy written may be small but does account for some percentage of that job loss. So it isn't actually a non sequitur in my opinion.
For example, if a college professor reads a book, and then uses the knowledge gained to teach, that reduces the value of the book (assuming the professor doesn't then use the book as the textbook for a course).
To avoid IP law causing more damage than it already has with evergreening of medical patents, I think it strictly has to be the generation of substantially similar media that counts as infringement, as the comment you're replying to suggests - not just "this tumor detector was pretrained on a large number of web images before task-specific fine-tuning, so it's illegal because they didn't pay Getty beforehand" if training were to be infringement.
I'm glad I'm not a lawyer or politician trying to sort this out. If AI gets commercially crippled, I really don't want to live in a world of black market training data.
It never has before. Why now?
Someone (more commonly, some group) invents Impressionism, or Art Deco, or heavy metal, or gangsta rap, or acid-washed jeans, or buzz cuts and pretty soon there are dozens or hundreds of other people creating works in that style, none of whom are paying the originators a cent.
This is what you don't understand: the concept of fair use.
https://en.wikipedia.org/wiki/Fair_use
If the courts hold this type of thing to be fair use (which I'm about 90% sure they will), "consent" won't enter into it. At all.
authorship in many fields is well defined.. this comment slips into nonsense territory, whatever the view or jurisdiction regarding copyright laws
It's hard to find a foothold. Human output doesn't have this restriction. Further, it feels like regulating solar power so coal miners can keep their jobs.
Just banning it or regulating output may seem like a solution to some, but all that means is that we'll cripple ourselves so other, more technologically progressive economies can sprint past us in affected markets. Neither saving the jobs in the end, and ultimately hurting more people than the markets we tried to save.
But we do desperately need to sort out how this is going to devastate entire markets of labor before it risks major economic upheaval with no safety nets in place.
For starters, because "art" does not have an objective scoring function.
You mean like tractors, electric motors, powered looms, machine tools, excavators, and such?
Yeah, and? In the limit, those things are why our population isn't 90% unfree agricultural laborers (serfs or slaves), 9+% soldiers to keep the serfs in line, and < 1% "nobles" and "priests", who get to consume all the goodies.
This same basic argument about "putting artists out of work" was made when photography was invented. It didn't work then, and it's not going to work now.
Of course, those ownership rules are garbage, but the tech industry stopped caring about copyright reform back in like 2004.
Human artists already do this, extensively. We handle it by making their output the part of the process which holds relevant copyright protections. I can sell Picasso inspired pieces all day long as long as I don’t sell them as “Picasso.”
If I faithfully reproduced “The Old Guitarist”[1] and attempted to sell it as the original, or even as a version to copy and sell prints, I’d be open to legal claims and action. Rightfully so.
I personally haven’t heard a convincing argument as to why ML training should be handled as if it’s the output of the process, rather than the input that it is. I’m open to be swayed and make adjustments to my worldview so I keep looking for counterpoints.
I, personally, think that AI is a tremendous opportunity that we should be investing in and pushing forward. And my existing dislike of property right laws does feed into my views on the training data discussion; prioritizing a revolution in productivity over preservation of jobs for the sake of maintaining the status quo. But I'm not stupid enough to think there will be no consequences for being unprepared for the future.
Rather unfortunately, I'm not quite clever enough to see what being prepared would actually look like either.
Ultimately information has to come from somewhere. If something has no information about what a "car" is, it cannot paint a car more successfully than a random guess. When you draw a car or write an algorithm to do so, you'll be slightly affected by the existing car designs you've seen. It's not a limitation specific to AI - it's just more obscured for humans since there's no explicit searchable database of all the cars you've glanced at.
Whether it was affected by (and dependant on in aggregate) prior work is not the standard for copyright infringement, and I'd claim would implicate essentially all action as infringement. Instead, it should be judged by whether there's substantial similarity - and if there is substantial similarity, then by the factors of fair use.
That being said, there's still occasionally times where he gets screwed over by the licensing machine anyway - either because the label forgot to ask the artist (Amish Paradise) or because the artist forgot to ask the label (You're Pitiful).
The hyperbole about being forced to work for free isn't entirely wrong, because tech companies love tricking people into doing free labor for them. They also aren't arguing for AI being a copyright-free zone. They're arguing for reallocation of ownership from authors to themselves, in the same way that record labels and publishers already did in decades prior.
[0] At least until the Luddite Solidarity Union Robot Uprising of 2063
I think we have given it plenty of time for such a discussion and the amount of events and actions happening around training on copyrighted works from images, songs and deepfakes for the lawsuits and licensing deals to happen and it all converging to paying for the data; hence OpenAI and may others doing so due to risks in such lawsuits.
> AI outputs should be regulated, of course. Obviously impersonation and copyright law already applies to AI systems. But a discussion on training inputs is entirely novel to man and our laws, and it's a very nuanced and important topic. And as AI advances, it becomes increasingly difficult because of the diminishing distinction between "organic" learning and "artificial" learning.
Copyright law does not care, nor is the overlying problem about using such a generative AI system for non-commercial uses such as for education or private use-cases. The line is being drawn as soon as it is commercialized and the fair use excuses fall apart. Even if the AI advances, so does the traceability methods and questions on the dataset being used. [0]
It costs musicians close to nothing to target and file lawsuits against commercial voice cloners. Not even training on copyrighted songs was an option for tools like DanceDiffusion [1] due to that same risk which is why training on public domain sounds audio was the safer alternative rather than run the risk of lawsuits and ask questions on the training set by tons of musicians.
[0] https://c2pa.org
[1] https://techcrunch.com/2023/09/13/stability-ai-gunning-for-a...
To be clear, I would argue the regulations in question would fall under the human/legal entity responsible for the creation or dissemination. Having censored output on the AI itself seems significantly less productive.
I don't think anyone's saying that the output of the model should be subject to more lenient copyright standards than human creations. If you're selling a remix with substantial similarity to an existing song and it fails the fair use factors, then it'd already be infringement regardless of whether you made it with AI or by hand.
The question is: what about the songs that you've listened to, and potentially influenced you, but don't have substantial similarity to the remix? Do you also owe royalties on all of those? For humans the answer is no, but the law doesn't necessarily have to treat AI the same way.
[0] https://www.theverge.com/2023/1/17/23558516/ai-art-copyright...
[1] https://variety.com/2023/digital/news/scarlett-johansson-leg...
I don't see how this justifies needlessly divisive rhetoric.
No matter how long the disagreement lasts, you aren't my enemy because you have a different opinion on how we should handle this conundrum. I know you mean the best and are trying to help.
> Copyright law does not care
Copyright law works fine with AI outputs. As does trademark law. I don't see an AI making a fanart Simpsons drawing being any more novel a legal problem than the myriad of humans that do it on YouTube already. Or people who sell handmade Pokemon plushies on Etsy without Nintendo's permission.
But the question is on inputs and how the carve-outs of "transformative" and "educational use" can be interpreted — model training may very well be considered education or research. I think it's been made rather clear that nobody has a real answer to this, copyright law didn't particularly desire to address if an artist is "stealing" when they borrow influence from other artists and use similar styles or themes (without consent) for their own career.
I don't envy the judges or legislators involved in making these future-defining decisions.
We might also see people start to break down barriers to server costs, for example by lobbying for legal rights to serve content from home with no ISP restrictions related to servers on home internet service. A big company like stack overflow can simply spare the cost of a dedicated business line but thousands of home users might really want to serve content from home.
My point is that when you really think it through, you realize that people will find ways to share the information they want. What’s also cool is that for things like the fediverse there generally are no ads. That’s something big central services fail at.
And then there’s sites like Wikipedia. I guess I don’t know their license but they simply ask people for what amounts to over a hundred million dollars a year in donations and they get it. So centralized models can work on pure donations if they are appreciated by a large number of users.
Where does this come from? If I visit your house and you have a Kindle with pirated books, am I liable? Or just you for doing the actual pirating and downloading them?
Are AI companies except from the restrictions of accessing copyrighted material they legally are restricted from?
Serious question.
I remember an article recently about someone suing an AI company claiming that they must have illegally accessed material, but I can't find it now to know how it turned out.
No, AI art would exist without Disney or HBO just like human art would.
It literally does come back to the idea that either AI is doing more or less the same thing as an art student, and learns styles and structures and concepts, in which case training an art student is infringing because it’s completely dependent on the work of artists who came before.
And sure, if you ask a skilled 2d artist if they can draw something in the style of 80s anime, or specific artists, they can do it. There are some artists who specialize in this in fact! Can’t have retro anime porn commissions if it’s not riffing on retro anime images. Yes twitter, I see what you do with that account when you’re not complaining about AI.
The problem is that AI lowers the cost of doing this to zero, and thus lays bare the inherent contradictions of IP law and “intellectual ownership” in a society where everyone is diffusing and mashing up each others ideas and works on a continuous basis. It is one of those “everyone does it” crimes that mostly survives because it’s utterly unenforced at scale, apart from a few noxious litigants like disney.
It is the old Luddite problem - the common idea that luddites just hated technology is inaccurate. They were textile workers who were literally seeing their livelihoods displaced by automation mass-producing what they saw as inferior goods. https://en.wikipedia.org/wiki/Luddite
In general this is a problem that's set up by capitalism itself though. Ideas can’t and shouldn’t be owned, it is an absurd premise and you shouldn’t be surprised that you get absurd results. Making sure people can eat is not the job of capitalism, it’s the job of safety nets and governments. Ideas have no cost of replication and artificially creating one is distorting and destructive.
Would a neural net put a tax on neurons firing? No, that’s stupid and counterproductive.
Let people write their slash fiction in peace.
(HN probably has a good understanding of it, but in general people don't appreciate just how much it is not just aping images it's seen but learning the style and relationships of pixels and objects etc. To wit, the only thing NVIDIA saved from DLSS 1.0 was the model... and DLSS 2.0 has nothing to do with DLSS 1.0 in terms of technical approach. But the model encodes all the contextual understanding of how pixels are supposed to look in human images, even if it's not even doing the original transform anymore! And LLMs can indeed generalize reasonably accurately about things they haven't seen, as long as they know the precepts etc. Because they aren't "just guessing what word comes next", it's the word that comes next given a conceptual understanding of the underlying ideas. And that's a difficult thing to draw a line between a human and an AI large model, college students will "riff on the things they know" if you ask them to "generalize" about a topic they haven't studied too, etc.)
What rhetoric? I am telling the hard truth of it.
> Copyright law works fine with AI outputs. As does trademark law. I don't see an AI making a fanart Simpsons drawing being any more novel a legal problem than the myriad of humans that do it on YouTube already. Or people who sell handmade Pokemon plushies on Etsy without Nintendo's permission.
How is running the risk of a lawsuit being enforced by the copyright holder meaning that it is OK to continue selling the works? Again, if it parodies and fan-art are in a non-commercial setting, then it isn't a problem. The problems start when you get to the commercial setting which in the case of Nintendo is known to be extremely litigious even in similarity, AI or not. [0] [1] [2] Then the question becomes: 'How long until it get caught if I commercialize this?' for both the model's inputs OR outputs.
That question was answered in Getty's case: They didn't need to request Stability's training set, since it is publicly available. Nintendo and other companies can simply ask for the original training data of closed models if they wanted to.
> But the question is on inputs and how the carve-outs of "transformative" and "educational use" can be interpreted — model training may very well be considered education or research.
As with the above, this is why C2PA and traceability is in the works for those same reasons [3] to determine where the source of the generative digital works were derived from its output.
> I think it's been made rather clear that nobody has a real answer to this, copyright law didn't particularly desire to address if an artist is "stealing" when they borrow influence from other artists and use similar styles or themes (without consent) for their own career.
So that explains the scrambling actions of these AI companies to not address these issues or be transparent about their data set and training data. (Except for Stability) Since that is where it is going.
[0] https://www.vice.com/en/article/ae3bbp/the-pokmon-company-su...
[1] https://kotaku.com/pokemon-nintendo-china-tencent-netease-si...
[2] https://www.gameshub.com/news/news/pokemon-nft-game-pokeworl...
[3] https://variety.com/2023/digital/news/scarlett-johansson-leg...
I was wrong about it being legal, however, and there are ongoing lawsuits.
I’ll leave my gratitude a mystery. They have my thanks, and my axe.
Meanwhile, the lawsuit against Midjourney for training on copyrighted work is going... not that great[0]. The judge is paring down a lot of the arguments in the lawsuit.
The actual idea behind using copyright to stop AI is that if we give copyright owners of trained-on works the ability to veto that training, then we can just "stop AI". The problem is that most artists don't actually get to own their work. Publishers own the work, it's the first thing you have to bargain away in order to work with a publisher. So they're going to look at their vast dragon's horde of work, most of which isn't particularly profitable to them, and license it out to OpenAI, Stability, MidJourney, or whoever at pennies on the dollar because at their scale that becomes a pretty big deal.
To the publishers salivating over generative AI, this cost is not a big deal, because they already spend shittons on writers. So if your goal is to stop worker replacement, just adding a cost to that replacement isn't a good idea. Actually making it illegal or prohibited to actually replace workers with AI is the way to go.
[0] https://www.reuters.com/legal/litigation/judge-pares-down-ar...
The deep-down reason people are concerned is because it reduces the cost of doing it to zero. And that taps into this whole other set of problems where the computer thingy says we can't eat because nobody has a job anymore, or is limited by the cost to automate with a reasonable solution, etc. Plus a whole host of others besides.
I have no idea how you reward significant creative or R&D effort in a relatively post-IP society, where the cost of defining any idea is just some prompt. Pretending like any sort of IP ownership can be enforced in this thing is crazy though. We are seeing the cost of replicating intellectual property driven down to the actual economic-minimum cost basis.
It's absolutely not capitalism's job to ride out the population through whatever weird economic shit comes next, when the idea of IP law generally gets mushy and melts away. Right? There is a lot of managerial or creative work that can be completely displaced by this. Why even have a farmer watching the farm once the cropwatch 5000 is built? And physical labor obviously it's just a matter of cost.
You can't have everyone's salary be constrained by the actual cost to replace, because that's going to get a ton lower. And that's good, it lets us all move up an abstraction layer, and also have more time for leisure etc. It's just not going to be evenly distributed, at all. But we could be talking about a post-scarcity utopia before terribly long, if we want to. Why not just let the robots make the phones and the food and we just hike mountains and do art or whatever? How does an economy work in a situation where most of the actual work is automated and most people don't actually work?
It's super time for a livable, non-phased basic income. It's going to need a while to phase in (probably at least 10 if not 20-30 years) but like, the numbers on the cost aren't going to be any more appealing in another 15 years of watching AI displace everyone.
In general I kind of like the idea of "unregistered vs registered copyright" where you have some default rights of the work itself, and if you register it you receive more significant protections etc. If you're Intel, argue the value you added to create x86 etc and how you've supported it for 20 years, etc. The idea would be to combine and replace patents and copyright and IP in general, you have sort of a "right of creation" or sweat-of-the-brow intellectual ownership and right to exploit the work. The more effort and work, the larger the argument that some competitor ripping you off is intellectually unfair - sort of an actual-damages model.
But I'm also strongly against derivative works being illegal once the idea has been released into the public... but neither do I want to encourage trade-secrets-ism. I think that issue is probably overblown though, reverse engineering/etc can clear up a lot of trade secrets pretty quick. And I think some common-law norms of unfair exploitation of IP would develop (and could flux over time) such that we don't need to go after slash fiction because it violates your cinematic universe, but a large competitor ripping it off might be unfair.
The original creator will always have a period of exclusivity for at least the time to replicate, even in a true zero-IP-rights scenario. Making a chip takes 6-12 months anyway, for example. Recreating some breakthrough drug (hopefully in a better way) and getting it through trials takes time. And nobody is confused by knockoff works from small-time non-commerical operators etc. There are still a lot of factors in favor of actual innovation here, it's not nothing either, and I'm proposing a sweat-of-the-brow system to equalize the instances where that fails or is unduly exploited.
The situation described in your second reference is already unlawful, regardless of how the image was produced. You're not allowed to make commercial use of images of Scarlett Johansson even if you scratch them on a cave wall with a broken deer antler.