AI companies have all kinds of arguments against paying for copyrighted content

>>rntn+(OP)
You've got to love the idea that copyright doesn't apply to input code (from others), but does to their output code.

https://www.documentcloud.org/documents/24117932-apple

replies(4): >>Taylor+0b >>vasdae+oc >>raytop+md >>hoosie+7f

>>anothe+K9
Yeah I would love it if these companies simply took a principled stance against IP restrictions, but alas. Like they are literally running directly in to the main problem these restrictions create and then thinking “we need a way to say those don’t apply to us while making sure they still apply to everyone else”.

replies(2): >>alphan+rc >>ryandr+gh

>>rntn+(OP)
There are potential arguments to say one should be able to use copyrighted material to train a model, but it’s a very slippery slope that leads to logic that also says the resulting models shouldn’t be protectable intellectual property.

Thus these arguments could backfire on those making them real quick. At this point though they have no choice but to make them as they’ve all clearly used copyrighted works to build their products.

The other challenge here is there needs to be some protection and compensation for folks producing original works in the first place. If we just end up with ML models training on the generated output of other ML models this is all going to go downhill real quick.

>>rntn+(OP)
And those arguments are now getting attention and legal consideration now that they’re being posited by the rich and the well-capitalized.

Some of these rhyme with the fair use and similar arguments put forth by the free software and anti-corporate-owned culture folks for the last couple decades. More honest (if cynical) is A16z’s take, of “the rich already put in a bunch of money, so now you can’t stop it.”

>>rntn+(OP)
OPINION REGARDING SOFTWARE CORPORATIONS' USE OF COPYRIGHTED DATA IN TRAINING PROPRIETARY AI MODELS RE: FAIR USE DOCTRINE, PSEUDOLEGAL REASONING, FACILE, BEAUTIFUL REASONING, FULLY SPECIOUS, TRENDING ON YOUTUBE, WONDERFUL BEAUTIFUL, FALSE EQUIVALENCE, STRAW MAN, TRENDING ON THE VERGE, LOGICALLY INCOHERENT, BEAUTIFUL, TRENDING ON GOOGLE NEWS, FACILE JUSTIFICATION, DEEPLY TORTURED, BEAUTIFUL, WE WILL SAY LITERALLY ANYTHING WE JUST WANT TO KEEP MAKING MONEY WITHOUT COMPETITION, NO RESPECT FOR INDEPENDENT CREATORS, NO RESPECT FOR ACTUAL LEGAL DOCTRINE, NO RESPECT FOR ANYBODY, THE MOST BEAUTIFUL REASONING, TRENDING ON ARS TECHNICA, HACKER NEWS, BY GREG RUTKOWSKI, BY CRAIG MULLINS, BY SPARTH, TRENDING

replies(1): >>artnin+Kc

>>rntn+(OP)
What would be the basis of a distinction between artificial vs. wetware intelligence? In both cases, a neural network is being trained.

replies(2): >>Manfre+df >>kmeist+Vn

>>anothe+K9
That's the way it's always worked. You read books to learn to code, then write code and you get paid for it, but that money doesn't trickle down to the writers of the books that taught you to code.

replies(2): >>freeja+Xc >>imover+bd

>>Taylor+0b
Here’s a principled stance: you don’t have the right to a sound, arrangement of pixels, or ideas. IP for nobody. If you disagree, then it should be “IP for everyone” and not “whether or not I support IP depends on who benefits from it”. An answer that differs from piracy to AI training is not a principled answer.

replies(2): >>artnin+Je >>Taylor+Le

>>cacony+Ub
Cringe

>>vasdae+oc
Those books you read were copyrighted.

replies(1): >>vasdae+Xd

>>vasdae+oc
Except, the “books” in this case is the code/content that people might like to get paid for in the first place. Also, AI companies aren’t “buying the books”

replies(1): >>French+Jd

>>anothe+K9
There are a lot of double standards when it comes to AI generation.

>>imover+bd
you're not buying books when going to a library

replies(3): >>LoganD+ne >>lelant+se >>sensan+Ne

>>freeja+Xc
Despite the books being copyrighted that doesn't give the authors any rights over the code I have written.

replies(1): >>anothe+Be

>>rntn+(OP)
If I take a copyrighted song and remix it do I owe the artist a royalty? Yes.

Stop making excuses. AI training on copyrighted works is straight wrong no matter how much you don't want it to be.

All of my internet comments are copyrighted btw, but I do offer a license of $1b usd/year for using them in a model if you'd like.

replies(2): >>yjftsj+Zf >>Ukv+mr

>>French+Jd
There truly is an xkcd for everything: https://xkcd.com/294

>>French+Jd
Well then let Microsoft send their employees with a camera each to the library.

replies(1): >>spaceg+gm

>>vasdae+Xd
It does when you produce the code verbatim, as happens frequently.

replies(1): >>jnovek+5h

>>alphan+rc
Currently it looks like ai generated content, as well as model weights can't be copyrighted. Either way, free reign for everyone would be great. But I'm afraid big rights holders will try and lobby against this.

replies(1): >>Taylor+Fh

>>alphan+rc
Right. I believe IP restrictions are harmful, don’t actually do what we commonly believe they do, and are a significant net negative to society. I am in favor of either slowly or quickly abolishing IP restrictions. Tho I think it would need to be done slowly to avoid shocks to an economy which has become dependent on these restrictions.

replies(1): >>300bps+Rk

>>rntn+(OP)
In wonder if copyrighted content will be needed at all in the future of AI training.

AlphaZero learned to play chess via self-play, not by reading books about chess.

Why couldn't the same happen for art for example?

For coding, won't a sufficiently advanced neural net be able to figure out how to use a programming language when given just the documentation?

And when most of our interactions are with AI, it will learn from our conversations with it. Asking some AI system why feature X was removed from programming language Y in version Z teaches it something. The next person who asks it which feature was removed from Y in version Z might be told "X". Without the AI system ever having to read about it. The interaction with AI could become a self-learning loop in and on itself.

replies(3): >>riku_i+ig >>fwip+7n >>kmeist+zn

>>French+Jd
The library pays/paid for those books (usually at much greater cost exactly because they're meant to be accessible), most often through taxes.

>>anothe+K9
If not for Wilhoit's law, we wouldn't have any laws at all!

>>breadw+8c
Scale and speed of adoption influencing the impact on the value of the source material.

A person can only service a few projects a year and it takes day or weeks on ingest new knowledge. A trained network only needs to train once and faster than a person, it can then be replicated and is only limited by computing power.

That skews the market and the value of the books. I would not write a book if it was then ingested by and AI and never sold to a person.

replies(1): >>breadw+Ik

>>rntn+(OP)
Copyright holders make all kinds of arguments for why they should be get money for incidental exposure to their work. This is all about greed and jealousy. If someone uses AI to make infringing content, existing laws already cover that. The fact that an ML model could be used to generate infringing content, and has exposure to or "knowledge" of some copyrighted material is immaterial. People just see someone else making money and want to try and get a piece of it.

replies(5): >>ethanb+of >>exabri+gg >>rvz+rh >>Animat+6k >>gumbal+vK

>>andy99+gf
> People just see someone else making money in a way that is completely dependent upon their own prior work and want to try and get a piece of it

replies(3): >>qt3141+En >>Ukv+Ho >>paulmd+eD

>>rntn+(OP)
“We have spent the money of some very rich people and created a popular product. These laws that we ignored go against our interests and should not apply to us. Did we mention, we represent some very rich people. “

replies(1): >>mistri+Qj

>>exabri+le
And your comment is just a remix of words from things you've read. The question is whether the result is a derivative work, which... I can't completely rule it out, but it's not obvious that that's all LLMs do. It pretty quickly gets to a question of exactly how the tech works and then gets philosophical (what is creativity?).

replies(1): >>mistri+Hm

>>andy99+gf
Lets try this:

I'd like you do give away 100% of your salary, ok?

Are you greedy if you say no?

replies(2): >>Tadpol+Ng >>bdcrav+bh

>>mg+Me
> AlphaZero learned to play chess via self-play, not by reading books about chess.

> Why couldn't the same happen for art for example?

> For coding, won't a sufficiently advanced neural net be able to figure out how to use a programming language when given just the documentation?

some domains are too complex and large to be cracked that way.

replies(1): >>mg+Wk

>>exabri+gg
This is a blatant non-sequitor. There are many approaches to actually having a good faith discussion on the societal/economic/moral/humanitarian effects of large-scale AI taking over entire workforces. Being coy and asking loaded questions does nothing to convince anyone of them.

replies(2): >>coding+ik >>kmeist+Hq

>>anothe+Be
Betcha I’ve spontaneously produced code that’s been written before. Who do I owe royalties to?

replies(2): >>sensan+Xi >>mistri+Aj

>>exabri+gg
If you use a snippet from Stack Overflow that came from a book, is the original publisher entitled to some of your salary?

replies(1): >>exabri+Kh

>>Taylor+0b
> Companies will always amorally argue for whatever makes them more money. I

There are no principles involved when companies advocate for or against things. Companies will always amorally argue for whatever makes them more money. They are entirely capable of arguing two opposing viewpoints if in one context viewpoint A makes them money and in another context opposite-viewpoint B makes them money. Being consistent, either logically, morally, ethically, or in principle, is not necessary.

"Copyright is good and necessary when it makes us money, and copyright is bad and wrong when it doesn't make us money" is a mundane and totally expected opinion coming from a corporation.

replies(1): >>Taylor+Sh

>>rntn+(OP)
For the last time: If I read a book and use an idea in it and you sue me for part of that idea, you will get laughed out of the court room.

If I do it with 100 books you will still get laughed out of the court room.

AI is only different because it can do the same to a million books.

>>andy99+gf
All I see is AI companies poorly justifying their grift that they know they don't want to pay for the content that they are commercializing without permission and pull the fair use excuses.

It is no wonder why OpenAI had to pay Shutterstock for training on their data and Getty suing Stability AI for training on their watermarked images and using it commercially without permission and actors / actresses filing lawsuits against commercial voice cloners which costs them close to nothing, as those companies either take down the cloned voice offering or shutdown.

These weak arguments from these AI folks sound like excuses justifying a newly found grift.

replies(3): >>artnin+Qi >>Tadpol+sj >>Ukv+cl

>>artnin+Je
Yeah. I think one can fairly clearly make a strong argument against these restrictions, but the organizations that would directly lose in the short term include basically most of the wealthiest companies in the world, so the lobbying effort would be enormous. The whole reason these companies are so wealthy is that IP restrictions very effectively concentrate wealth. This is to the detriment of everyone else, but they will fight hard to keep things this way.

>>bdcrav+bh
This is what Silicon Valley doesn't understand: The concept of Consent.

If someone posts something to StackOverflow, they're intending to help both the original person and anyone that comes along later with the same question with their coding problem, and that's the extent of it.

An artist making a painting or song has not consented to training algorithms on their copyrighted work. In fact, neither has the StackOverflow person.

Boggles my mind this concept is so absent from the minds of SV folk.

replies(4): >>depere+Ml >>Turing+Cm >>EMIREL+en >>gagany+Df1

>>ryandr+gh
Yeah, fair. For my own failings I still find it disappointing when that happens.

>>rvz+rh
Afaik Getty's case is strong against stability because they were dumb enough not to remove the water marks before training so now their model recreates their watermark, which is an infringement. Also let's not pretend openai licensed their dataset from all the stock image sites they scraped. The main reason they have a deal with Shutterstock is probably easy access and also Shutterstock partnered up with openai for ai tech to sell on their website.

>>jnovek+5h
You're not a computer whose sole purpose is to regurgitate code verbatim at massive scale when requested, unlike an LLM designed for exactly that purpose.

replies(1): >>androm+w02

>>rvz+rh
When you're viewing everyone with a different opinion than you as a grifter, corporate rat, or some other malicious entity, you've disabled the ability or desire for people to try to engage with you. You won't be convinced, and you're already being uncivil and fallacious.

AI outputs should be regulated, of course. Obviously impersonation and copyright law already applies to AI systems. But a discussion on training inputs is entirely novel to man and our laws, and it's a very nuanced and important topic. And as AI advances, it becomes increasingly difficult because of the diminishing distinction between "organic" learning and "artificial" learning. As well as when stopping AI from — as an example — learning from research papers means we miss out on life-saving medication. Where do property rights conflict with human rights?

They're important conversations to have, but you've destroyed the opportunity to have them from the starting gun.

replies(3): >>sillys+Jk >>Turing+pn >>rvz+2r

>>jnovek+5h
under what system of law?! -- there are at least four major economies in the world, and they all have different rules about these intermingled issues.

>>andrei+wf
.. we are going to generate a LOT of money and you (gov) might get some if you treat us well!

>>andy99+gf
Yes, most of this is whining from the "copyright forever" crowd. If you get out something vaguely similar to something old, they complain. What they're really worried about is becoming obsolete, not being copied.

The case against "tribute bands" is much stronger than the case against large language models built with some copyrighted content. Those are a blatant attempt to capitalize on the specific works of specific named people.

replies(1): >>gumbal+IK

>>Tadpol+Ng
The ability of AI to produce the content that it does actually will be reducing potentially hundreds of thousands of jobs.

Now, the percentage of those jobs lost because some of the content was accidentally copy written may be small but does account for some percentage of that job loss. So it isn't actually a non sequitur in my opinion.

replies(2): >>Turing+hn >>Tadpol+Co

>>Manfre+df
If that logic is allowed to take hold, consider this: Would it make sense to charge a smarter person more for reading the same book?

For example, if a college professor reads a book, and then uses the knowledge gained to teach, that reduces the value of the book (assuming the professor doesn't then use the book as the textbook for a course).

replies(1): >>Manfre+X2b

>>Tadpol+sj
Thank you.

replies(1): >>rvz+5r

>>Taylor+Le
Who will pay for Stack Overflow servers and maintenance if OpenAI is just going to scrape their content and stop people from going to Stack Overflow?

replies(3): >>alphan+Rm >>Taylor+3x >>ibecke+mB

>>riku_i+ig
For humans, it works.

replies(2): >>jakein+km >>riku_i+Cp

>>rvz+rh
You likely benefit from machine learning applications constantly without realizing. The spam filters for your email, the scanning for defects of the products you use and the rails they were delivered on, when you enter a search query or translate a page into English, weather modelling to give you accurate predictions and early warnings, etc.

To avoid IP law causing more damage than it already has with evergreening of medical patents, I think it strictly has to be the generation of substantially similar media that counts as infringement, as the comment you're replying to suggests - not just "this tumor detector was pretrained on a large number of web images before task-specific fine-tuning, so it's illegal because they didn't pay Getty beforehand" if training were to be infringement.

>>rntn+(OP)
The argument of the training inputs are "just like reading a book" seems like a fair statement IMO albeit antiquated these days. However, generating text, audio, or images in the specific style of an individual creator seems like a slippery slope that ultimately deserves some kind of renumeration.

I'm glad I'm not a lawyer or politician trying to sort this out. If AI gets commercially crippled, I really don't want to live in a world of black market training data.

replies(4): >>Turing+cm >>Ukv+qm >>Tadpol+6n >>Mtinie+2o

>>exabri+Kh
For a bunch of rent-seekers who issue licenses to use their prior work, they really struggle with the various licences that other people's work can be issued with.

>>simple+sl
> However, generating text, audio, or images in the specific style of an individual creator seems like a slippery slope that ultimately deserves some kind of renumeration.

It never has before. Why now?

Someone (more commonly, some group) invents Impressionism, or Art Deco, or heavy metal, or gangsta rap, or acid-washed jeans, or buzz cuts and pretty soon there are dozens or hundreds of other people creating works in that style, none of whom are paying the originators a cent.

>>lelant+se
you joke but Google did exactly that, i worked on the program for a little while.

>>mg+Wk
I don't think a human deprived from birth of interaction with cultural artifacts or other people would be that much more intelligent than a chimpanzee.

>>simple+sl
Making "style" into private property subject to infringement seems like the far more slippery slope to me. I think existing copyright laws are more than sufficient, for cases where there are actually generations with substantial similarity to some protected work.

>>exabri+Kh
> This is what Silicon Valley doesn't understand: The concept of Consent.

This is what you don't understand: the concept of fair use.

https://en.wikipedia.org/wiki/Fair_use

If the courts hold this type of thing to be fair use (which I'm about 90% sure they will), "consent" won't enter into it. At all.

replies(1): >>rvz+bt

>>yjftsj+Zf
> your comment is just a remix of words from things you've read.

authorship in many fields is well defined.. this comment slips into nonsense territory, whatever the view or jurisdiction regarding copyright laws

>>rntn+(OP)
Casual reminder that unionization is a far more effective bulwark against AI trash art than copyright ownership and lawsuits are.

replies(1): >>LeafIt+Mx

>>300bps+Rk
Sounds like you support IP then, including stopping piracy.

>>simple+sl
> However, generating text, audio, or images in the specific style of an individual creator seems like a slippery slope that ultimately deserves some kind of renumeration.

It's hard to find a foothold. Human output doesn't have this restriction. Further, it feels like regulating solar power so coal miners can keep their jobs.

Just banning it or regulating output may seem like a solution to some, but all that means is that we'll cripple ourselves so other, more technologically progressive economies can sprint past us in affected markets. Neither saving the jobs in the end, and ultimately hurting more people than the markets we tried to save.

But we do desperately need to sort out how this is going to devastate entire markets of labor before it risks major economic upheaval with no safety nets in place.

>>mg+Me
> Why couldn't the same happen for art for example?

For starters, because "art" does not have an objective scoring function.

>>exabri+Kh
Their position (and also mine, even though I have otherwise lots of disagreement with most SV folk in other areas) is that for those ML purposes, no consent need be sought or granted. If the work is publicly accessible, it's usable for AI. This is legally supported by fair use (to be determined by the courts, keep an eye out on the Andersen v. Stability lawsuit)

>>coding+ik
> The ability of AI to produce the content that it does actually will be reducing potentially hundreds of thousands of jobs.

You mean like tractors, electric motors, powered looms, machine tools, excavators, and such?

Yeah, and? In the limit, those things are why our population isn't 90% unfree agricultural laborers (serfs or slaves), 9+% soldiers to keep the serfs in line, and < 1% "nobles" and "priests", who get to consume all the goodies.

This same basic argument about "putting artists out of work" was made when photography was invented. It didn't work then, and it's not going to work now.

>>Tadpol+sj
> AI outputs should be regulated, of course.

Why "of course"?

replies(2): >>daniel+Xp >>Tadpol+jr

>>mg+Me
Chess has a very straightforward definition of objectively correct and incorrect moves; there is no such thing for art. Though I have to wonder how many rounds of looking at random garbage and rating it you'd have to do for some kind of supervised learning to eventually yield coherent output...

>>ethanb+of
It's actually impressive that Weird Al has made it as far as he did now that I think about it

replies(2): >>sensan+Wn >>kmeist+Ro

>>breadw+8c
AI parameters are at least theoretically ownable while physical collections of neurons are not ownable as per 14A. So long as AI researchers are going to assert ownership over their network parameters, then they also need to conform to the ownership rules over the things they train the parameters against.

Of course, those ownership rules are garbage, but the tech industry stopped caring about copyright reform back in like 2004.

>>qt3141+En
Weird Al explicitly seeks out the permission from the people whose songs he's parodying, despite not really having to (legally), and he compensates them in kind as well, so kinda the exact opposite of what the AI crowd wants to do.

>>simple+sl
“However, generating text, audio, or images in the specific style of an individual creator seems like a slippery slope that ultimately deserves some kind of renumeration.”

Human artists already do this, extensively. We handle it by making their output the part of the process which holds relevant copyright protections. I can sell Picasso inspired pieces all day long as long as I don’t sell them as “Picasso.”

If I faithfully reproduced “The Old Guitarist”[1] and attempted to sell it as the original, or even as a version to copy and sell prints, I’d be open to legal claims and action. Rightfully so.

I personally haven’t heard a convincing argument as to why ML training should be handled as if it’s the output of the process, rather than the input that it is. I’m open to be swayed and make adjustments to my worldview so I keep looking for counterpoints.

—— [1] https://en.m.wikipedia.org/wiki/The_Old_Guitarist

>>coding+ik
I don't disagree with you at all, your point is important to communicate and debate on! But the framing of the original comment was unproductive and only served to hurt the argument.

I, personally, think that AI is a tremendous opportunity that we should be investing in and pushing forward. And my existing dislike of property right laws does feed into my views on the training data discussion; prioritizing a revolution in productivity over preservation of jobs for the sake of maintaining the status quo. But I'm not stupid enough to think there will be no consequences for being unprepared for the future.

Rather unfortunately, I'm not quite clever enough to see what being prepared would actually look like either.

>>ethanb+of
> in a way that is completely dependent upon their own prior work

Ultimately information has to come from somewhere. If something has no information about what a "car" is, it cannot paint a car more successfully than a random guess. When you draw a car or write an algorithm to do so, you'll be slightly affected by the existing car designs you've seen. It's not a limitation specific to AI - it's just more obscured for humans since there's no explicit searchable database of all the cars you've glanced at.

Whether it was affected by (and dependant on in aggregate) prior work is not the standard for copyright infringement, and I'd claim would implicate essentially all action as infringement. Instead, it should be judged by whether there's substantial similarity - and if there is substantial similarity, then by the factors of fair use.

>>qt3141+En
Weird Al gets permission and licenses for every song parody he makes. The legal definition of parody in fair use wouldn't exactly cover everything he does.

That being said, there's still occasionally times where he gets screwed over by the licensing machine anyway - either because the label forgot to ask the artist (Amish Paradise) or because the artist forgot to ask the label (You're Pitiful).

>>mg+Wk
human brain may still be orders of magnitude more powerful computation tool compared to modern GPU clusters, and you have 8B semiorganized humans on the planet.

>>Turing+pn
Because everything else is, why should AI output be any different?

>>Tadpol+Ng
It's important to keep in mind that AI doesn't take over entire workforces because it is better, or does jobs humans can't, but because it is cheaper. I've played with several AI art and text models and none of them I would consider to be better than a human. However, they are good enough - and more importantly, legally ownable[0] capital goods - such that corporations would rather have an AI serve you to make their own scale problems go away.

The hyperbole about being forced to work for free isn't entirely wrong, because tech companies love tricking people into doing free labor for them. They also aren't arguing for AI being a copyright-free zone. They're arguing for reallocation of ownership from authors to themselves, in the same way that record labels and publishers already did in decades prior.

[0] At least until the Luddite Solidarity Union Robot Uprising of 2063

>>rntn+(OP)
It's a bit ridiculous that a computer is allowed to read books from pirated sources but a human isn't. If you legally buy a copy of a book I don't think the law should distinguish between a human reading it vs an ML system reading it, but training your ML system on a torrent of books that a human couldn't legally read seems a lot worse.

replies(1): >>LeafIt+fx

>>Tadpol+sj
> When you're viewing everyone with a different opinion than you as a grifter, corporate rat, or some other malicious entity, you've disabled the ability or desire for people to try to engage with you.

I think we have given it plenty of time for such a discussion and the amount of events and actions happening around training on copyrighted works from images, songs and deepfakes for the lawsuits and licensing deals to happen and it all converging to paying for the data; hence OpenAI and may others doing so due to risks in such lawsuits.

> AI outputs should be regulated, of course. Obviously impersonation and copyright law already applies to AI systems. But a discussion on training inputs is entirely novel to man and our laws, and it's a very nuanced and important topic. And as AI advances, it becomes increasingly difficult because of the diminishing distinction between "organic" learning and "artificial" learning.

Copyright law does not care, nor is the overlying problem about using such a generative AI system for non-commercial uses such as for education or private use-cases. The line is being drawn as soon as it is commercialized and the fair use excuses fall apart. Even if the AI advances, so does the traceability methods and questions on the dataset being used. [0]

It costs musicians close to nothing to target and file lawsuits against commercial voice cloners. Not even training on copyrighted songs was an option for tools like DanceDiffusion [1] due to that same risk which is why training on public domain sounds audio was the safer alternative rather than run the risk of lawsuits and ask questions on the training set by tons of musicians.

[0] https://c2pa.org

[1] https://techcrunch.com/2023/09/13/stability-ai-gunning-for-a...

replies(1): >>Tadpol+Ju

>>sillys+Jk
Thanks for what exactly?

replies(1): >>Tadpol+Sv

>>Turing+pn
Because as a society we generally already agree that human outputs need be restricted as well. Being artificial in origin doesn't change the nature of trademark infringement or outright theft (generally speaking — some content that is illegal now because it victimizes others, being turned into victimless but gross content is an edge case).

To be clear, I would argue the regulations in question would fall under the human/legal entity responsible for the creation or dissemination. Having censored output on the AI itself seems significantly less productive.

>>exabri+le
> If I take a copyrighted song and remix it do I owe the artist a royalty? Yes.

I don't think anyone's saying that the output of the model should be subject to more lenient copyright standards than human creations. If you're selling a remix with substantial similarity to an existing song and it fails the fair use factors, then it'd already be infringement regardless of whether you made it with AI or by hand.

The question is: what about the songs that you've listened to, and potentially influenced you, but don't have substantial similarity to the remix? Do you also owe royalties on all of those? For humans the answer is no, but the law doesn't necessarily have to treat AI the same way.

>>Turing+Cm
There is nothing "fair use" around this: [0] or this [1] which both cases are done without permission and are commercial uses.

[0] https://www.theverge.com/2023/1/17/23558516/ai-art-copyright...

[1] https://variety.com/2023/digital/news/scarlett-johansson-leg...

replies(1): >>Turing+YE8

>>rvz+2r
> I think we have given it plenty of time for such a discussion

I don't see how this justifies needlessly divisive rhetoric.

No matter how long the disagreement lasts, you aren't my enemy because you have a different opinion on how we should handle this conundrum. I know you mean the best and are trying to help.

> Copyright law does not care

Copyright law works fine with AI outputs. As does trademark law. I don't see an AI making a fanart Simpsons drawing being any more novel a legal problem than the myriad of humans that do it on YouTube already. Or people who sell handmade Pokemon plushies on Etsy without Nintendo's permission.

But the question is on inputs and how the carve-outs of "transformative" and "educational use" can be interpreted — model training may very well be considered education or research. I think it's been made rather clear that nobody has a real answer to this, copyright law didn't particularly desire to address if an artist is "stealing" when they borrow influence from other artists and use similar styles or themes (without consent) for their own career.

I don't envy the judges or legislators involved in making these future-defining decisions.

replies(1): >>rvz+QM

>>rvz+5r
It seems strange to interrogate why someone thanked someone else, doesn't it? Are you trying to start a fight with them over a simple acknowledgement that they agree?

replies(1): >>nickth+fG

>>300bps+Rk
Generally my view is that if we change the way IP works, the world would adapt just fine. Instead of large centralized services we might see smaller federated services. For example something like stack overflow could be hosted on an activity-pub based federated system. The way the fediverse currently works is that a large number of enthusiasts support their little corner of the fediverse with server expenses probably in the hundreds of dollars per year. So instead of one big site with hundreds of thousands of dollars in server costs per year you would have thousands of federated servers with costs in the hundreds of dollars per year. What’s great is that as the user base grows so can the number of servers and the number of enthusiast server operators interested in running their own server.

We might also see people start to break down barriers to server costs, for example by lobbying for legal rights to serve content from home with no ISP restrictions related to servers on home internet service. A big company like stack overflow can simply spare the cost of a dedicated business line but thousands of home users might really want to serve content from home.

My point is that when you really think it through, you realize that people will find ways to share the information they want. What’s also cool is that for things like the fediverse there generally are no ads. That’s something big central services fail at.

And then there’s sites like Wikipedia. I guess I don’t know their license but they simply ask people for what amounts to over a hundred million dollars a year in donations and they get it. So centralized models can work on pure donations if they are appreciated by a large number of users.

>>strken+Uq
> It's a bit ridiculous that a computer is allowed to read books from pirated sources but a human isn't.

Where does this come from? If I visit your house and you have a Kindle with pirated books, am I liable? Or just you for doing the actual pirating and downloading them?

Are AI companies except from the restrictions of accessing copyrighted material they legally are restricted from?

Serious question.

I remember an article recently about someone suing an AI company claiming that they must have illegally accessed material, but I can't find it now to know how it turned out.

replies(1): >>strken+4U

>>kmeist+Jm
I'm generally pro-union, but I can't see the connection here. Can you expand on this?

replies(1): >>kmeist+d41

>>300bps+Rk
If LLMs can't find the answer then people will go ask it on SO. If SO is liable to shut down because the majority of their business got wiped out then somebody with a vested interest (like say, MSFT/OpenAI) will step in and bail them out (or create a clone, or something functionally equivalent).

>>ethanb+of
> completely dependent

No, AI art would exist without Disney or HBO just like human art would.

It literally does come back to the idea that either AI is doing more or less the same thing as an art student, and learns styles and structures and concepts, in which case training an art student is infringing because it’s completely dependent on the work of artists who came before.

And sure, if you ask a skilled 2d artist if they can draw something in the style of 80s anime, or specific artists, they can do it. There are some artists who specialize in this in fact! Can’t have retro anime porn commissions if it’s not riffing on retro anime images. Yes twitter, I see what you do with that account when you’re not complaining about AI.

The problem is that AI lowers the cost of doing this to zero, and thus lays bare the inherent contradictions of IP law and “intellectual ownership” in a society where everyone is diffusing and mashing up each others ideas and works on a continuous basis. It is one of those “everyone does it” crimes that mostly survives because it’s utterly unenforced at scale, apart from a few noxious litigants like disney.

It is the old Luddite problem - the common idea that luddites just hated technology is inaccurate. They were textile workers who were literally seeing their livelihoods displaced by automation mass-producing what they saw as inferior goods. https://en.wikipedia.org/wiki/Luddite

In general this is a problem that's set up by capitalism itself though. Ideas can’t and shouldn’t be owned, it is an absurd premise and you shouldn’t be surprised that you get absurd results. Making sure people can eat is not the job of capitalism, it’s the job of safety nets and governments. Ideas have no cost of replication and artificially creating one is distorting and destructive.

Would a neural net put a tax on neurons firing? No, that’s stupid and counterproductive.

Let people write their slash fiction in peace.

(HN probably has a good understanding of it, but in general people don't appreciate just how much it is not just aping images it's seen but learning the style and relationships of pixels and objects etc. To wit, the only thing NVIDIA saved from DLSS 1.0 was the model... and DLSS 2.0 has nothing to do with DLSS 1.0 in terms of technical approach. But the model encodes all the contextual understanding of how pixels are supposed to look in human images, even if it's not even doing the original transform anymore! And LLMs can indeed generalize reasonably accurately about things they haven't seen, as long as they know the precepts etc. Because they aren't "just guessing what word comes next", it's the word that comes next given a conceptual understanding of the underlying ideas. And that's a difficult thing to draw a line between a human and an AI large model, college students will "riff on the things they know" if you ask them to "generalize" about a topic they haven't studied too, etc.)

replies(1): >>mistri+ye1

>>Tadpol+Sv
If every HN was full of thank you posts, it would be unreadable. Upvotes exist. I prefer comments to add to the discussion, so I don’t find the users request for more information to be that baffling. Their comment was at least trying to further their understanding, which is more than sillysaurx did.

replies(2): >>Tadpol+VZ >>sillys+D21

>>andy99+gf
AI companies monetising people’s IP need to pay up. End of story. Make smarter ai next time that can “learn” with less content - or as it stands, so called ai is just a massive database that procedurally mixes content to generate what looks like “new” content.

replies(1): >>gagany+af1

>>Animat+6k
What people are worried about is the age old thieves stealing property and monetising it - which is what a lot of ai companies do. Pay up or create your own content and its fair game.

replies(1): >>gagany+0f1

>>Tadpol+Ju
> I don't see how this justifies needlessly divisive rhetoric.

What rhetoric? I am telling the hard truth of it.

> Copyright law works fine with AI outputs. As does trademark law. I don't see an AI making a fanart Simpsons drawing being any more novel a legal problem than the myriad of humans that do it on YouTube already. Or people who sell handmade Pokemon plushies on Etsy without Nintendo's permission.

How is running the risk of a lawsuit being enforced by the copyright holder meaning that it is OK to continue selling the works? Again, if it parodies and fan-art are in a non-commercial setting, then it isn't a problem. The problems start when you get to the commercial setting which in the case of Nintendo is known to be extremely litigious even in similarity, AI or not. [0] [1] [2] Then the question becomes: 'How long until it get caught if I commercialize this?' for both the model's inputs OR outputs.

That question was answered in Getty's case: They didn't need to request Stability's training set, since it is publicly available. Nintendo and other companies can simply ask for the original training data of closed models if they wanted to.

> But the question is on inputs and how the carve-outs of "transformative" and "educational use" can be interpreted — model training may very well be considered education or research.

As with the above, this is why C2PA and traceability is in the works for those same reasons [3] to determine where the source of the generative digital works were derived from its output.

> I think it's been made rather clear that nobody has a real answer to this, copyright law didn't particularly desire to address if an artist is "stealing" when they borrow influence from other artists and use similar styles or themes (without consent) for their own career.

So that explains the scrambling actions of these AI companies to not address these issues or be transparent about their data set and training data. (Except for Stability) Since that is where it is going.

[0] https://www.vice.com/en/article/ae3bbp/the-pokmon-company-su...

[1] https://kotaku.com/pokemon-nintendo-china-tencent-netease-si...

[2] https://www.gameshub.com/news/news/pokemon-nft-game-pokeworl...

[3] https://variety.com/2023/digital/news/scarlett-johansson-leg...

[4] https://c2pa.org/

>>rntn+(OP)
hot take. abolish copyright and setup cryptographic systems that base themselves on the concepts of issuance. that way the copyright content is only different because it is 'official' and this way you can have clout because you have the 'legit' copy, but do not enforce arbitration or copyright law. If I want to have the legit copy then I will seek it out, but if I do not know, cannot afford the 'official' copy, then let me get access to the knockoff pirate version ... I accept the risk ....and society benefits from my knowledge gain....GDP is promoted by intelligence, Innovation, hacks, etc. and to achieve that access to knowledge is paramount. that means access to technology

>>LeafIt+fx
I'm thinking specifically of companies which used the books3 dataset (a rip of the private tracker bibliotik) to train LLMs.

I was wrong about it being legal, however, and there are ongoing lawsuits.

>>nickth+fG
Then vote and/or report? They aren't trying to further their understanding, that much is clear when their dialogue started by saying anyone who holds a different ideological position is a grifter. They're trying to start a fight with a bystander who supported the "other".

>>nickth+fG
I’ve been on HN since the beginning. pg himself said that thank yous are fine. Empty but positive comments are not harmful.

I’ll leave my gratitude a mystery. They have my thanks, and my axe.

>>LeafIt+Mx
The new WGA contract has fairly strong prohibitions against publishers wielding AI as a tool to devalue writers. They can't insist on writers using ChatGPT and they can't demand a lower rate for "AI-assisted" writing. Individual writers can still prompt ChatGPT but they don't have to use it.

Meanwhile, the lawsuit against Midjourney for training on copyrighted work is going... not that great[0]. The judge is paring down a lot of the arguments in the lawsuit.

The actual idea behind using copyright to stop AI is that if we give copyright owners of trained-on works the ability to veto that training, then we can just "stop AI". The problem is that most artists don't actually get to own their work. Publishers own the work, it's the first thing you have to bargain away in order to work with a publisher. So they're going to look at their vast dragon's horde of work, most of which isn't particularly profitable to them, and license it out to OpenAI, Stability, MidJourney, or whoever at pennies on the dollar because at their scale that becomes a pretty big deal.

To the publishers salivating over generative AI, this cost is not a big deal, because they already spend shittons on writers. So if your goal is to stop worker replacement, just adding a cost to that replacement isn't a good idea. Actually making it illegal or prohibited to actually replace workers with AI is the way to go.

[0] https://www.reuters.com/legal/litigation/judge-pares-down-ar...

>>paulmd+eD
well said and .. the unique abilities of fast computers, cloud clusters and fast networks, to solve and serve clients in ways that humans cannot do.. must have sufficient weight in judgement. Specifically, some law school parable about how a person A can do this and group B does that, is very much missing the weighting of judgement needed. Many smart-enough people with responsibility do not think through the implications of tech, while returning to what they were taught, about comparable situations in law and the like.. IMO

replies(1): >>paulmd+aI1

>>gumbal+IK
It's not theft, etc. Boring tangent to try to introduce

replies(1): >>gumbal+nY1

>>gumbal+vK
Calling it a massive database makes it clear you lack a basic understanding of how it works. Please go learn the basics before commenting.

replies(1): >>gumbal+iY1

>>exabri+Kh
The notion of consent you're pushing does not have a legal basis and is also deeply silly.

>>mistri+ye1
I am not approaching this from a law-school perspective at all. I'm aware that I'm arguing against a massive amount of current legal doctrine and a social shift that would be monumental. I just know the idea of someone "owning" an idea is rotten in general, when we are all diffusion machines ourselves, who are riffing off everything everyone else is saying etc. I learn when other people post good shit, and that informs the things I tell to others. The student becomes the commercially-employed professional becomes the textbook author. It seems absurd to single out this one particular act of diffusion as being unique because it was done by a machine - is an art student not affected by a coca-cola ad or whatever? That's a commercial property too.

The deep-down reason people are concerned is because it reduces the cost of doing it to zero. And that taps into this whole other set of problems where the computer thingy says we can't eat because nobody has a job anymore, or is limited by the cost to automate with a reasonable solution, etc. Plus a whole host of others besides.

I have no idea how you reward significant creative or R&D effort in a relatively post-IP society, where the cost of defining any idea is just some prompt. Pretending like any sort of IP ownership can be enforced in this thing is crazy though. We are seeing the cost of replicating intellectual property driven down to the actual economic-minimum cost basis.

It's absolutely not capitalism's job to ride out the population through whatever weird economic shit comes next, when the idea of IP law generally gets mushy and melts away. Right? There is a lot of managerial or creative work that can be completely displaced by this. Why even have a farmer watching the farm once the cropwatch 5000 is built? And physical labor obviously it's just a matter of cost.

You can't have everyone's salary be constrained by the actual cost to replace, because that's going to get a ton lower. And that's good, it lets us all move up an abstraction layer, and also have more time for leisure etc. It's just not going to be evenly distributed, at all. But we could be talking about a post-scarcity utopia before terribly long, if we want to. Why not just let the robots make the phones and the food and we just hike mountains and do art or whatever? How does an economy work in a situation where most of the actual work is automated and most people don't actually work?

It's super time for a livable, non-phased basic income. It's going to need a while to phase in (probably at least 10 if not 20-30 years) but like, the numbers on the cost aren't going to be any more appealing in another 15 years of watching AI displace everyone.

In general I kind of like the idea of "unregistered vs registered copyright" where you have some default rights of the work itself, and if you register it you receive more significant protections etc. If you're Intel, argue the value you added to create x86 etc and how you've supported it for 20 years, etc. The idea would be to combine and replace patents and copyright and IP in general, you have sort of a "right of creation" or sweat-of-the-brow intellectual ownership and right to exploit the work. The more effort and work, the larger the argument that some competitor ripping you off is intellectually unfair - sort of an actual-damages model.

But I'm also strongly against derivative works being illegal once the idea has been released into the public... but neither do I want to encourage trade-secrets-ism. I think that issue is probably overblown though, reverse engineering/etc can clear up a lot of trade secrets pretty quick. And I think some common-law norms of unfair exploitation of IP would develop (and could flux over time) such that we don't need to go after slash fiction because it violates your cinematic universe, but a large competitor ripping it off might be unfair.

The original creator will always have a period of exclusivity for at least the time to replicate, even in a true zero-IP-rights scenario. Making a chip takes 6-12 months anyway, for example. Recreating some breakthrough drug (hopefully in a better way) and getting it through trials takes time. And nobody is confused by knockoff works from small-time non-commerical operators etc. There are still a lot of factors in favor of actual innovation here, it's not nothing either, and I'm proposing a sweat-of-the-brow system to equalize the instances where that fails or is unduly exploited.

>>rntn+(OP)
I am getting tired of corporations taking public resources, keeping profits for themselves and pushing the costs onto the public. Therefore, I suggest the training sets (content, labeling) for AI systems should be considered world wide common property that everyone can use benefit from.

>>gagany+af1
It is a massive database, but not in the classic sense. I believe your comment is a projection.

>>gagany+0f1
Outside the organised ai bubble that's what it's called I'm afraid.

replies(1): >>gagany+jS3

>>sensan+Xi
Except LLMs aren't exactly optimized for verbatim regurgitation.

>>gumbal+nY1
Well, no. The courts disagree. You're in a bubble that you should leave

>>rntn+(OP)
I believe this is relevant: https://en.wikipedia.org/wiki/Remix#Copyright_implications

>>rvz+bt
Your first reference is an opinion of the lawyers for a concerned party, i.e., meaningless. Lawyers make nonsensical claims all the time. It's one of the things they get paid for.

The situation described in your second reference is already unlawful, regardless of how the image was produced. You're not allowed to make commercial use of images of Scarlett Johansson even if you scratch them on a cave wall with a broken deer antler.

>>breadw+Ik
It might make sense as a theoretical approximation of the value proposition. But you're going to piss people off if you try, so it's probably not a great sales technique.

zlacker

AI companies have all kinds of arguments against paying for copyrighted content