When Microsoft steals all code on their platform and sells it, they get lauded. When "Open" AI steals thousands of copyrighted images and sells them, they get lauded.
I am skeptical of imaginary property myself, but fuck this one set of rules for the poor, another set of rules for the masses.
Conservatism consists of exactly one proposition, to wit:
There must be in-groups whom the law protects but does not bind, alongside out-groups whom the law binds but does not protect.
—Composer Frank Wilhoit[1]
[1]: https://crookedtimber.org/2018/03/21/liberals-against-progre...
That can't possibly be a valid claim, right? AFAIK copyright is "gone" after the original author dies + ~70 years. Before fairly recently it was even shorter. Something from 1640 surely can't be claimed under copyright protection. There are much more recent changes where that might not be the case, but 1640?
> When Jane Rando uses devtools to check a website source code she gets sued.
Again, that doesn't sound like a valid suit. Surely she would win? In the few cases I've heard of where suits like this are brought against someone they've easily won them.
"There is no such thing as liberalism — or progressivism, etc.
There is only conservatism. No other political philosophy actually exists; by the political analogue of Gresham’s Law, conservatism has driven every other idea out of circulation."
There is no such thing as being a Liberal or Progressive, there is only being a Conservative or anti-Conservative, and while there is much nuänce and policy to debate about that, it boils down to deciding whether you actually support or abhor the idea of "the law" (which is a much broader concept than just the legal system) existing to enforce or erase the distinction between in-groups and out-groups.
But that's just my read on it. Getting back to intellectual property, it has become a bitter joke on artists and creatives, who are held up as the beneficiaries of intellectual property laws in theory, but in practice are just as much of an out-group as everyone else.
We are bound by the law—see patent trolls, for example—but not protected by it unless we have pockets deep enough to sue Disney for not paying us.
They are absolutely completely and utterly bullshit. Nobody with half an ear for music will mistake my playing of Bach's G Minor Sonata with Arthur Grumiaux (too many out of tune notes :-D). But yet, YouTube still manages to match this to my playing, probably because they have never heard it before now (I recorded it mere minutes before).
So no, it isn't a valid claim, but this algorithm trained on certain examples of work, manages to make bad classifications with potentially devastating ramifications for the creator (I'm not a monetized YouTube artist, but if this triggered a complete lockout of my Google account(s), this likely end Very Badly).
I think it's a very relevant comparison to the GP's examples.
It's not, but good luck talking to a human at Youtube when the video gets taken down.
> Again, that doesn't sound like a valid suit. Surely she would win?
Assuming she could afford the lawyer, and that she lives through the stress and occasional mistreatment by the authority, yes, probably. Both are big ifs, though.
I'm not a lawyer, but my understanding is that while the "1640's violin composition" itself may be out of copyright, if I record myself playing it, my recording of that piece is my copyright. So if you took my file (somehow) and used it without my permission, and I could prove it, I could claim copyright infringement.
That's my understanding, and I've personally operated that way to avoid any issues since it errs on the side of safety. (Want to use old music, make sure the license of the recording explicitly says public domain or has license info)
But it can still be weaponized to prevent legitimate resubmissions of parallel works, that can potentially deplatform legitimate users, depending on the reviewer and the clarity of the rebuttal.
It has indeed happened.
https://boingboing.net/2018/09/05/mozart-bach-sorta-mach.htm...
Sony later withdrew their copyright claim.
There are two pieces to copyright when it comes to public domain:
* The work (song) itself -- can't copyright that
* The recording -- you are the copyright owner. No one, without your permission, can re-post your recording
And of course, there is derivative work. You own any portion that is derivative of the original work.
That's freedom of speech for everyone who can afford a lawyer to bring suit against a music rights-management company.
I haven't been following super closely but I don't know of any claims or examples where input images were recreated to a significant degree by stable diffusion.
Do you have any evidence for those claims, or anything resembling those examples?
Music copyright has long expired for classical music, and big shots are definitely not exempt from where it applies. Just look at how much heat Ed Sheeran, one of the biggest contemporary pop stars, got for "stealing" a phrase that was literally just chanting "Oh-I" a few times (just to be clear, I am familiar with the case and find it infuriating that this petty rent-seeking attempt went to trial at all, even if Sheeran ended up being completely cleared, but to great personal distress as he said afterwards).
And who ever got sued for using dev tools? Is there even a way to find that out?
https://en.m.wikipedia.org/wiki/Viacom_International_Inc._v.....
https://www.radioclash.com/archives/2021/05/02/youtuber-gets...
For being sued for looking at source here is the first result on Google
https://www.wired.com/story/missouri-threatens-sue-reporter-...
Among many others. Classical music may have fallen into public domain, but modern performances of it is copyrightable, and some of the big companies use copyright matching systems, including YouTube's, that often flags new performances as copies of recordings.
But surely the answer should be to fix the broken YT system and to educate politicians to abstain from baseless threats, not to make AI researchers pay for it?
Right, that's my point... I can sue anyone for anything, doesn't mean I'll win.
Trademarks require active defense to avoid genericization. Copyright may be asserted at the holder's discretion.
This becomes particularly onerous when trolls claim copyright on published recordings of environmental sounds that happen to be similar but not identical to someone else's but they do have a legitimate claim on the original recording.
Anyone with a mouth can run it and threaten a lawsuit. If fact, I threaten to sue you for misinformation right now unless you correct your post. Fat lot of good my threat will do because no judge in their right mind would entertain said lawsuit because it's baseless.
You cant sue if you dont have money, a big corp can sue even if they know they are wrong.
Presumably by "the masses" you meant "the large corporations"?
Usually, "the masses" means "the common people" ... i.e. not much different from "the poor."
If you meant corporations, I'm 100% behind this comment.
You put it as a remix, but remixes are credited and expressed as such.
I think that the argument being made by some artists is that the training process itself violates copyright just by using the training data.
That’s quite different from arguing that the output violates copyright, which is what the tweet in this case was about.
I don’t see Midjourney (et al) as remixes, myself. More like “inspired by.”
To add to that, there is provisions to lock her out of pushing new videos to the platform if the number of unresolved copyright claims passes some low number (3?).
So she loses new revenue until her claims prevail, and of course the entity which the claim is made for knows that and has no incentive to help her (don't they even get the monetization from her videos in the meantime ?)
When it’s a faceless mass of 100k employees…? Not so much.
Its all the same they just dont realize this.
If you asked every developer on earth to implement FizzBuzz, how many actually different implementations would you get? Probably not very many. Who should own the copyright for each of them? Would the outcome be different for any other product feature? If you asked every dev on earth to write a function that checked a JWT claim, how many of them would be more or less exactly the same? Would that be a copyright violation? I hope the courts answer some of these questions one day.
https://twitter.com/ebkim00/status/1579485164442648577
Not sure if this was fed the original image as an input or not.
Also seen a couple cases where people explicitly trained a network to imitate an artist's work, like the deceased Kim Jung Gi.
Thousands at least. Some of which would actually work.
Does it matter? If you examined every copyright lawsuit on earth over code, how many of them would actually be over FizzBuzz?
It’s only logical that people twist and bend the rules.
Anyway, this kind of bs will go on until people start bringing companies to court over this.
I still don’t get why don’t lawyers start offering their services for a cut of the damages… there probably is good money to be made by suing companies that put copyright-infringing ai in production.
All I can take away from this is the absurdity of intellectual property laws in general. I agree with the GP, if people are sharing stuff, it's fair game. If you don't want people using stuff you made, keep it to yourself. Pretending we can apply controls to how freely available info is used is silly anyway
I think over time we are going to see the following:
- If you take say a star wars poster, and inpaint in a trained face over luke's, and sell that to people as a service, you will probably be approached for copyright and trademark infringement.
- If you are doing the above with a satirical take, you might be able to claim fair use.
- If you are using AI as a "collage generator" to smash together a ton of prompts into a "unique" piece, you may be safe from infringement but you are taking a risk as you don't know what % of source material your new work contains. I'd like to imagine if you inpaint in say 20 details with various sub-prompts that you are getting "safer".
Left: “Girl with a Pearl Earring, by Johannes Vermeer” by Stable Diffusion Right: Girl with a Pearl Earring by Johannes Vermeer
This specific one is not copyright violation as it is old enough for copyright to expire. But the same may happen with other images.
from https://alexanderwales.com/the-ai-art-apocalypse/ and https://alexanderwales.com/addendum-to-the-ai-art-apocalypse...
If you “trace” another artists work the hammer comes down though. For Copilot it’s way easier to get it to obviously trace.
Left: “Girl with a Pearl Earring, by Johannes Vermeer” by Stable Diffusion Right: Girl with a Pearl Earring by Johannes Vermeer
This specific one is not copyright violation as it is old enough for copyright to expire. But the same may happen with other images.
from https://alexanderwales.com/the-ai-art-apocalypse/ and https://alexanderwales.com/addendum-to-the-ai-art-apocalypse...
Even if deduplication efforts are done, that painting will still be in the background of movie shots etc.
Code is only protected to the degree it is creative and not functionally driven anyway.
So the reduced band of possible expression often directly reduces the protectability-through-copyright.
Even that if done by a person as far as I understand it would not constitute a copyright infringement. It's a separate work mimicking Vermeer's original. The closest real world equivalent I can think of is probably the Obama Hope case by AP vs Shepard Fairy but that settled out of court so we don't really know what the status of that kind of reproduction is legally. On top of that though the SD image isn't just a recoloring with some additions like Fairy's was so it's not quite as close to the original as that case is.
I can imagine Mona Lisa in my head, but it doesn't really "exist" verbatim in my head. It's only an approximation.
I believe copilot works the same way (?)
So much for “generation” - it seems as if these models are just overfitting on extremely small subset of input data that it did not utterly failed to train on, almost that there could be geniuses who would be able to directly generate weight data from said images without all the gradient descent thing.
So very similar to how the music industry treats sampling then?
Everybody using CoPilot needs to get "code sample clearance" from the original copyright holder before publishing their remix or new program that uses snippets of somebody else's code...
Try explaining _that_ to your boss and legal department.
"To: <all software dev> Effective immediately, any use of Github is forbidden without prior written approval from both the CTO and General Councel."
Without that context, fizzbuzz is not that different from a matrix transpose function to me.
Sometimes the original information is there in the model, encoded/compressed/however you want to look at it, and can be reproduced.
I suppose whoever wants to pay the fees would “own” these things ?
Stable Diffusion actually has a similar problem. Certain terms that directly call up a particular famous painting by name - say, the Mona Lisa[0] - will just produce that painting, possibly tiled on top of itself, and it won't bother with any of the other keywords or phrases you throw at it.
The underlying problem is that the AI just outright forgets that it's supposed to create novel works when you give it anything resembling the training set data. If it was just that the AI could spit out training set data when you ask for it, I wouldn't be concerned[1], but this could also happen inadvertently. This would mean that anyone using Copilot to write production code would be risking copyright liability. Through the AI they have access to the entire training set, and the AI has a habit of accidentally producing output that's substantially similar to it. Those are the two prongs of a copyright infringement claim right there.
[0] For the record I was trying to get it to draw a picture of the Mona Lisa slapping Yoshikage Kira across the cheek
[1] Anyone using an AI system to "launder" creative works is still infringing copyright. AI does not carve a shiny new loophole in the GPL.
[1] https://en.wikipedia.org/wiki/SCO_Group,_Inc._v._Internation....
The scenes à faire doctrine would certainly let you paint your own picture of a pretty girl with a large earring, even a pearl one. That, however, is definitely the same person, in the same pose/composition, in the same outfit. The colors are slightly off, but the difference feels like a technical error rather than an expressive choice.
"Copying" a style is not a derivative work:
> Why isn't style protected by copyright? Well for one thing, there's some case law telling us it isn't. In Steinberg v. Columbia Pictures, the court stated that style is merely one ingredient of expression and for there to be infringement, there has to be substantial similarity between the original work and the new, purportedly infringing, work. In Dave Grossman Designs v. Bortin, the court said that:
> "The law of copyright is clear that only specific expressions of an idea may be copyrighted, that other parties may copy that idea, but that other parties may not copy that specific expression of the idea or portions thereof. For example, Picasso may be entitled to a copyright on his portrait of three women painted in his Cubist motif. Any artist, however, may paint a picture of any subject in the Cubist motif, including a portrait of three women, and not violate Picasso's copyright so long as the second artist does not substantially copy Picasso's specific expression of his idea."
https://www.thelegalartist.com/blog/you-cant-copyright-style
For literally everything but music, yes.
Even by the standards of copyright technicality, music copyright is weird. For example, if you ask a lawyer[0] what parts of copyright set it apart from other forms of property law[1], they would probably answer that it's federally preempted[2] and that it has constitutionally-mandated term limits.
Which, of course, is why music has a second "recording copyright", which was originally created by states assigning perpetual copyright to sound recordings. I wish I was making this up.
So the musical arrangement that constitutes that song from 1640? Absolutely public domain. You can tell people how to play Monteverdi all damned day. But every time you record that song being played, that creates a new copyright on that recording only. This is analogous to how making a cartoon of a public-domain fairy tale gives you ownership over that cartoon only. Except because different performers are all trying to play the same music as perfectly as possible, the recordings will sound the same and trip a Content ID match.
Oh, and because music copyright has two souls, the Sixth Circuit said there's no de minimus for sampling. That's why sample-happy rap is dead.
If you want public domain music on your YouTube video you either record it yourself or license a recording someone else did. I think there are CC recordings of PD music but I'm not sure. Either way you'll also need to repeatedly prove this to YouTube staff that would much rather not have to defend you against a music industry that's been out for blood for half a century at this point.
[0] Who, BTW, I am very much NOT
[1] Yes, yes, I know I'm dangerously close to uttering the dangerous propaganda term "intellectual property". You can go back to bed Mr. Stallman.
[2] Which means states can't make their own supra-federal copyright law and any copyright suit immediately goes to federal court.
Now a human can take inspiration from like 100 different sources and probably end up with something that no one would recognize as derivative to any of them. But it also wouldn't be obvious that the human did that.
But with an ML model, it's clearly a derivative in that the learned function is mathematically derived from its dataset and so is all the resulting outputs.
I think this brings a new question though. Because till now derivative was kind of implied that the output was recognizable as being derived.
With AI, you can tweak it so the output doesn't end up being easily recognizable as derived, but we know it's still derived.
Personally I think what really matters is more a question of what should be the legal framework around it. How do we balance the interests of AI companies and that of developers, artists, citizens who are the authors of the dataset that enabled the AI to exist. And what right should each party be given?
It's similar to saying that any digital representation of an image isn't an image just a dataset that represent it.
If what you said was any sort of defense every image copyright would never apply to any digital image, because the images can be saved in different resolutions, different file formats, or encoded down. e.g. if a jpeg 'image' was only an image at an exact set of digital bits i could save it again with a different quality setting and end up with a different set of digital bits.
But everyone still recognises when an image looks the same, and courts will uphold copyright claims regardless of the digital encoding of an image. So goodluck with that spurious argument that it's not copyright because 'its on the internet (oh its with AI etc).
The reason doesn't really matter...
But the machine learning model has studied every single one of them.
And maybe more preposterous, if its dataset had no FizzBuzz implementation would it even be able to re-invent it?
I feel this is the big distinction that probably annoys people.
That and the general fact that everyone is worried it'll devalue the worth of an experienced developer as AI will make hard thing easier, require less effort and talent to learn and thus making developers less high demand and probably lower paid.
A broadcaster of copyrighted works is not protected against infringement just because they expect viewers to only watch programming they own.
Sounds like MS has devised a massive automated code laundering racket.
This means MS really shouldn't have used copyleft code at all, and really shouldn't be selling copilot in this state, but "luckily" for them, short of a class action suit I don't really see any recourse for the programmers who's work they're reselling.
(Sorry I didn't log my experiment results at the time. None of it was related to work I'd done - I used time adjustment functions if I remember correctly)
But anyway, how I see stable diffusion being different is that it's a tool to generate all sorts of images, including copyrighted images.
It's more like a database of *how to* generate images rather than a database *of* images. Maybe there isn't that much of a difference when it comes to copyright law. If you ask an artist to draw a copyrighted image for you, who should be in trouble? I'd say the person asking most of the time, but in this case we argue it's the people behind the pencil or whatever. Why? Because it's too easy? Where does a service like fiver stand here?
So if a tool is able to generate something that looks indistinguishable from some copyrighted artwork, is it infringing on copyright? I can get on board with yes if it was trained on that copyrighted artwork, but otherwise I'm not so sure.
To make it concrete, imagine the latest Disney movie poster. You redraw it 95% close to the original, just changing the actual title. Then you sell your poster on Amazon at half the price of the actual poster. Would you get a copyright strike ?
Oh, actually I remember now -- I think the copyright complaint specifically said what recording they thought I was infringing, and it was the correct piece.
The interesting part is if AI will be considered a tooling mechanism much like the tooling used to record and manipulate a music sample into a new composition.
It is current at the SCOTUS so we should see a ruling for the USA sometime in the next year or so.
https://en.m.wikipedia.org/wiki/Andy_Warhol_Foundation_for_t...
Warhol’s estate seems likely to lose and their strongest argument is that Warhol took a documentary photo and transformed it into a commentary on celebrity culture. Here, I don’t even see that applying: it just looks like a bad copy.
https://www.scotusblog.com/2022/10/justices-debate-whether-w...
As GP says, no one really cares, but it seems hard to satisfy SA... even if you are pasting into open source, is your license compatible with CC?
Perhaps I'm over-thinking this.
The main problem I see with generating attribution is that the algorithm obviously doesn't "know" that it's generating identical code. Even in the original twitter post, the algorithm makes subtle and essentially semantically synonymous changes (like the changing the commenting style). So for all intents and purposes it can't attribute the function because it doesn't know _where_ it's coming from and copied code is indistinguishable from de novo code. Copilot will probably never be able to attribute code short of exhaustively checking the outputs using some symbolical approach against a database of copyleft/copyrighted code.
It looks like it wouldn't in the UK, probably wouldn't in the US but would in Germany. The cases seem to hinge on the level of intellectual creativity of the photograph involved. The UK said that trying to create an exact copy was not an original endeavour whereas Germany said the task of exact replication requires intellectual/technical effort of it's own merit.
https://www.theipmatters.com/post/are-photographs-of-public-...
The issue is in how it creates the output. Both Dalle and Copilot can work only by taking work of people in past, sucking up their earned know how and creations and remixing it. All that while not crediting (or paying) anyone. The software itself might be great but it only works because it was fed with loads of quality material.
It's smart copy&paste with obfuscation. If thats ok legally. You can imagine soon it could be used to rewrite whole codebases while avoiding any copyright. All the code will technically be different, but also the same.
On the subject of trademarks the issue is as far as I know even more on the end user because the protections on them is around use in commerce and consumer confusion not about just recreating them like copyright protections.
This is the problem of applying the idea of ownership to ideas and expression like art. Art in particular is a very remix and recombination driven field.
My real worry is downstream infringement risk, since fair use is non-transitive. Microsoft can legally provide you a code generator AI, but you cannot legally use regurgitated training set output[1]. GitHub Copilot is creating all sorts of opportunities to put your project in legal jeopardy and Microsoft is being kind of irresponsible with how they market it.
[0] Note that we're assuming published work. Doing the exact same thing Microsoft did, but on unpublished work (say, for irony's sake, the NT kernel source code) might actually not be fair use.
[1] This may give rise to some novel inducement claims, but the irony of anyone in the FOSS community relying on MGM v. Grokster to enforce the GPL is palpable.
Also, register your code with the copyright office.
Edit: Apparently, with the #1 post on HN right now, you could also just go here: https://githubcopilotinvestigation.com/
This specific one would not be a problem, but doing it with a still copyrighted work would be.