Nope. DALL-E generates images with the Getty Watermark, so clearly there’s copyrighted materials in its training set: https://www.reddit.com/r/dalle2/comments/xdjinf/its_pretty_o...
What did the photograph do to the portrait artist? What did the recording do to the live musician?
Here’s some highfalutin art theory on the matter, from almost a hundred years ago: https://en.wikipedia.org/wiki/The_Work_of_Art_in_the_Age_of_...
Conservatism consists of exactly one proposition, to wit:
There must be in-groups whom the law protects but does not bind, alongside out-groups whom the law binds but does not protect.
—Composer Frank Wilhoit[1]
[1]: https://crookedtimber.org/2018/03/21/liberals-against-progre...
They are still their own separate works!
If a painter paints a person for commission, and then that person also commissions a photographer to take a picture of them, is the photographer infringing on the copyright of the painter? Absolutely not; the works are separate.
If a recording artist records a public domain song that another artist performs live, is the recording artist infringing on the live artist? Heavens, no; the works are separate.
On the other hand, these "AI's" are taking existing works and reusing them.
Say I write a song, and in that song, I use one stanza from the chorus of one of your songs. Verbatim. Would you have a copyright claim against me for that? Of course, you would!
That's what these AI's do; they copy portions and mix them. Sometimes they are not substantial portions. Sometimes, they are, with verbatim comments (code), identical structure (also code), watermarks (images), composition (also images), lyrics (songs), or motifs (also songs).
In the reverse of your painter and photographer example, we saw US courts hand down judgment against an artist who blatantly copied a photograph. [1]
Anyway, that's the difference between the tools of photography (creates a new thing) and sound recording (creates a new thing) versus AI (mixes existing things).
And yes, sound mixing can easily stray into copyright infringement. So can other copying of various copyrightable things. I'm not saying humans don't infringe; I'm saying that AI does by construction.
[1]: https://www.reuters.com/world/us/us-supreme-court-hears-argu...
If I take a song, cut it up, and sing over it, my release is valid. If I parody your work, that's my work. If you paint a picture of a building and I go to that spot and take a photograph of that building it is my work.
I can derive all sorts of things, things that I own, from things that others have made.
Fair use is a thing: https://www.copyright.gov/fair-use/
As for talking about the originals, would an artist credit every piece of inspiration they have ever encountered over a lifetime? Publishing a seed seems fine as a nice thing to do, but pointing at the billion pictures that went into the drawing seems silly.
[0] https://techcrunch.com/2012/03/22/microsoft-and-tivo-drop-th...
To be fair I thought it might be at least a week or two.
It has indeed happened.
https://boingboing.net/2018/09/05/mozart-bach-sorta-mach.htm...
Sony later withdrew their copyright claim.
There are two pieces to copyright when it comes to public domain:
* The work (song) itself -- can't copyright that
* The recording -- you are the copyright owner. No one, without your permission, can re-post your recording
And of course, there is derivative work. You own any portion that is derivative of the original work.
https://blog.barac.at/a-business-experiment-in-data-dignity
Yes I am quoting myself
https://en.m.wikipedia.org/wiki/Viacom_International_Inc._v.....
https://www.radioclash.com/archives/2021/05/02/youtuber-gets...
For being sued for looking at source here is the first result on Google
https://www.wired.com/story/missouri-threatens-sue-reporter-...
Those are effectively cases of cryptomnesia[0]. Part and parcel of learning.
If you don't want broad access your work, don't upload it to a public repository. It's very simple. Good on you for recognising that you don't agree with what GitHub looks at data in public repos, but it's not their problem.
Among many others. Classical music may have fallen into public domain, but modern performances of it is copyrightable, and some of the big companies use copyright matching systems, including YouTube's, that often flags new performances as copies of recordings.
People upthread have reproduced and demonstrated that that's not the issue here.
EDIT: Actually, OP says "The variant it produces is not on my machine." - https://twitter.com/DocSparse/status/1581560976398114822
> Wish people who don't know at all how it works stopped acting all outraged when they're laughably wrong.
Physician, heal thyself.
Anyone with a mouth can run it and threaten a lawsuit. If fact, I threaten to sue you for misinformation right now unless you correct your post. Fat lot of good my threat will do because no judge in their right mind would entertain said lawsuit because it's baseless.
Because it exposes their direct hypocrisy in this, its fair use for OSS but not for us.
Questions here are very important, and its no surprise GitHub avoided answering anything about CoPilot's legality:
Yes, I'm sure.
> I'm not familiar with the exact data set they used for SD and whether or not Disney art was included, but my understanding is that their claim to legality comes from arguing that the use of images as training data is 'fair use'.
They could argue that. But since the American court system is currently (almost) de facto "richest wins," their argument will probably not mean much.
The way to tell if something was in the dataset would be to use the name of a famous Disney character and see what it pulls up. If it's there, then once the Disney beast finds out, I'm sure they'll take issue with it.
And by the way, I don't buy all of the arguments for machine learning as fair use. Sure, for the training itself, yes, but once the model is used by others, you now have a distribution problem.
More in my whitepaper against Copilot at [1].
They already have one open source part I know of, the new conhost[0].
That sounds like the pro-innovation bias: https://en.m.wikipedia.org/wiki/Pro-innovation_bias
Put another way, AI's are tools that give more power to already powerful entities.
Depending on your preferred telemetry settings, GitHub Copilot may also collect and retain the following, collectively referred to as “code snippets”: source code that you are editing, related files and other files open in the same IDE or editor, URLs of repositories and files paths.
https://twitter.com/ebkim00/status/1579485164442648577
Not sure if this was fed the original image as an input or not.
Also seen a couple cases where people explicitly trained a network to imitate an artist's work, like the deceased Kim Jung Gi.
From the FAQ https://github.com/features/copilot/
Left: “Girl with a Pearl Earring, by Johannes Vermeer” by Stable Diffusion Right: Girl with a Pearl Earring by Johannes Vermeer
This specific one is not copyright violation as it is old enough for copyright to expire. But the same may happen with other images.
from https://alexanderwales.com/the-ai-art-apocalypse/ and https://alexanderwales.com/addendum-to-the-ai-art-apocalypse...
Left: “Girl with a Pearl Earring, by Johannes Vermeer” by Stable Diffusion Right: Girl with a Pearl Earring by Johannes Vermeer
This specific one is not copyright violation as it is old enough for copyright to expire. But the same may happen with other images.
from https://alexanderwales.com/the-ai-art-apocalypse/ and https://alexanderwales.com/addendum-to-the-ai-art-apocalypse...
The big difference is that cars were a tool that helped regular people by being a force multiplier. Stable Diffusion and DALL-E are not force multipliers in the same way. Sure, you may now produce images that you couldn't before, but there are far fewer profitable uses for images than for cars. Images don't materially affect the world, but cars can.
Here’s like the first link after a DuckDuckGo search for “copyright utilitarian”:
However, copyright law does not extend to useful items. Therefore, complications may arise when sculptural works are also “useful” items. In these instances, copyright law will protect purely artistic elements of a useful article as long as the useful item can be identified and exists independently of the utilitarian aspects of the article (this concept is sometimes called the “separability test”). 17 U.S.C. §. A “useful article” is an article that has a purpose beyond pure aesthetic value.
https://www.rtlawoffices.com/articles/can-i-copyright-my-des...
Is it safe to assume the rest of the downvotes were from people who were also incorrect?
[0] https://en.m.wikipedia.org/wiki/Quod_licet_Iovi,_non_licet_b...
https://www.nytimes.com/2022/09/30/books/early-cormac-mccart...
I suppose whoever wants to pay the fees would “own” these things ?
https://laion.ai/blog/laion-5b/
Not exactly what you asked, but hopefully useful? The model weights are about 4 GiB I believe.
But that doesn't make it any better.
An example: https://twitter.com/DaveScheidt/status/1578411434043580416
> I also know software devs who are extremely excited about AI art and GPT-3 but are outraged by Copilot.
The fear is not unwarranted though. I can clearly see AI replacing most jobs (not just in tech) but art, crafts, music and even science. There probably will be no field untouched by AI in this decade and completely replaced by next decade.
We have multiple extinction events for humanity lined up: Climate Change, Nuclear Apocalypse and now AI.
We will have to not just work towards reducing harm to the Planet, but also work towards stopping meaningless Wars and figuring out how to deal with unemployment and economic crisis that is looming on the horizon. The only ones to suffer in the end would be the "elites" (or will they be the first depending on how quickly Civilization goes towards Anarchy?).
Can't say for sure. But definitely gloomy days ahead.
> Yes, many of us will turn into cowards when automation starts to touch our work, but that would not prove this sentiment incorrect - only that we're cowards.
>> Dude. What the hell kind of anti-life philosophy are you subscribing to that calls "being unhappy about people trying to automate an entire field of human behavior" being a "coward". Geez.
>>> Because automation is generally good, but making an exemption for specific cases of automation that personally inconvenience you is rooted is cowardice/selfishness. Similar to NIMBYism.
It's true cowardice to assume that our own profession should be immune from AI while other professions are not. Either dislike all AI, or like it. To be in between is to be a hypocrite.
For me, I definitely am on the side of full AI, even if it automates my job away, simply because I see AI as an advancing force on mankind.
Here is some reading material for those of you who disagree with reality:
https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...
https://en.wikipedia.org/wiki/Idea–expression_distinction
https://h2o.law.harvard.edu/cases/5004
https://www.loeb.com/en/insights/publications/2020/04/johann...
[1] https://en.wikipedia.org/wiki/SCO_Group,_Inc._v._Internation....
"Copying" a style is not a derivative work:
> Why isn't style protected by copyright? Well for one thing, there's some case law telling us it isn't. In Steinberg v. Columbia Pictures, the court stated that style is merely one ingredient of expression and for there to be infringement, there has to be substantial similarity between the original work and the new, purportedly infringing, work. In Dave Grossman Designs v. Bortin, the court said that:
> "The law of copyright is clear that only specific expressions of an idea may be copyrighted, that other parties may copy that idea, but that other parties may not copy that specific expression of the idea or portions thereof. For example, Picasso may be entitled to a copyright on his portrait of three women painted in his Cubist motif. Any artist, however, may paint a picture of any subject in the Cubist motif, including a portrait of three women, and not violate Picasso's copyright so long as the second artist does not substantially copy Picasso's specific expression of his idea."
https://www.thelegalartist.com/blog/you-cant-copyright-style
But that is exactly how it works. Translation companies license (or produce) huge corpuses of common sentences across multiple languages that are either used directly or fed into a model.
Third party human translators are asked to assign rights to the translation company. https://support.google.com/translate/answer/2534530
I have to assume this is just people being protective of their own profession and consequently, setting up a high bar for what constitutes as performance in that profession.
What you're describing is a choice. They chose which people to believe, with zero vetting.
> The point is that with ML training data, such a vast quantity is required that it's unreasonable to expect humans to be able to research and guarantee the legal provenance of it all.
I'm not sure what you're presenting here is actually true. A key part of ML training is the training part. Other domains require a pass/fail classification of the model's output (see image identification, speech recognition, etc.) so why is source code any different? The idea that "it's too much data" is absolutely a cop-out and absurd, especially for a company sitting on ~$100B in cash reserves.
Your argument kind of demonstrates the underlying point here: They took the cheapest/easiest option and it's harmed the product.
> A crawler simply believes that licenses, which are legally binding statements, are made by actual owners, rather than being fraud. It does seem reasonable to address the issue with takedowns, however.
Yes, and to reiterate, they chose this method. They were not obligated to do this, they were not forced to pick this way of doing things, and given the complete lack of transparency it's a large leap of faith to assume that their training data simply looked at LICENSE files to determine which licenses were present.
For what it's worth, it doesn't seem that that's what OpenAI did when they trained the model initially in their paper[1]:
Our training dataset was collected in May 2020 from 54 mil-
lion public software repositories hosted on GitHub, contain-
ing 179 GB of unique Python files under 1 MB. We filtered
out files which were likely auto-generated, had average line
length greater than 100, had maximum line length greater
than 1000, or contained a small percentage of alphanumeric
characters. After filtering, our final dataset totaled 159 GB.
I have not seen anything concrete about any further training after that, largely because it isn't transparent.It is current at the SCOTUS so we should see a ruling for the USA sometime in the next year or so.
https://en.m.wikipedia.org/wiki/Andy_Warhol_Foundation_for_t...
Warhol’s estate seems likely to lose and their strongest argument is that Warhol took a documentary photo and transformed it into a commentary on celebrity culture. Here, I don’t even see that applying: it just looks like a bad copy.
https://www.scotusblog.com/2022/10/justices-debate-whether-w...
You’re really not going to solve this problem with marketing (“blog posts”) or some pro-Github story from data scientists. You need a DMCA / removal request feature akin to Google image search and you need work on understanding product problems from the customer perspective.
It looks like it wouldn't in the UK, probably wouldn't in the US but would in Germany. The cases seem to hinge on the level of intellectual creativity of the photograph involved. The UK said that trying to create an exact copy was not an original endeavour whereas Germany said the task of exact replication requires intellectual/technical effort of it's own merit.
https://www.theipmatters.com/post/are-photographs-of-public-...
I tried out of curiosity. Here[1] are the first 8 images that came up with the prompt "Disney mickey mouse" using the stable diffusion V1.4 model. Personally I don't really see why Disney or any other company would take issue with the image generation models, it just seems more or less like regular fan art.
See https://en.m.wikipedia.org/wiki/Peterloo_Massacre for example
I'm also not sure that Copilot is just reproducing code, but that's a separate discussion.
> If I reproduced part of a book from a source that claimed incorrectly it was released under a permissive license, I would still be liable for that misuse. Especially if I was later made aware of the mistake and didn’t correct it.
I don't believe that's correct in the first instance (at least from a criminal perspective). If someone misrepresents to you that they have the right to authorise you to publish something, and it turns out they don't have that right, you did not willingly infringe and are not liable for the infringement from a criminal perspective[1]. From a civil perspective, likely the copyright owner could still claim damages from you if you were unable to reach a settlement. A court would probably determine the damages to award based on real damages (including loss of earnings for the content creator), rather than anything punitive if it's found that
Further, most jurisdictions have exceptions for short extracts of a larger copyrighted work (e.g. quotes from a book), which may apply to Copilot.
This is my own code, I wrote it myself just now. Can I copyright it?
``` function isOdd (num) { if (num % 2 === 0) { return true; } else { return false; } } ```
What about the following:
``` function isOddAndNotSunday (num) { const date = new Date(); if (num % 2 === 0 && date.getDay() > 0) { return true; } else { return false; } } ```
Where do we draw the line?
[0]: https://docs.github.com/en/site-policy/github-terms/github-t... [1]: https://www.law.cornell.edu/uscode/text/17/506
Even if CHOLMOD is easily the best sparse symmetric solver, it is notoriously not used by scipy.linalg.solve, though, because numpy/scipy developers are anti-copyleft fundamentalists and have chosen not to use this excellent code for merely ideological reasons... but this will not last: thanks to the copilot "filtering" described here, we can now recover a version of CHOLMOD unencumbered by the license that the author originaly distributed it under! O brave new world, that has such people in it!
IT Crowd Piracy Warning https://www.youtube.com/watch?v=ALZZx1xmAzg
[0] https://www.statista.com/statistics/817918/number-of-busines...
That's actually a very real problem that mega money has been spent on. The same legal problem appears on sites like YouTube around fair use and copyright. In terms of fair use that doesn't apply here see:
https://softwareengineering.stackexchange.com/questions/1217...
Regardless platforms are partially responsible for the content that their users upload into them. Most try to absolve themselves of this responsibility with their terms of service but legally that's just not possible.
Personally I'm an advocate for fair use but I'm also an advocate for strong copyright laws and their enforcement. In the short time the internet has been available to most people in the world there is a habit of stealing others work and claiming it as your own. Quite often this is for some financial gain.
Maybe not right this moment but our actions have consequences in the future.
For those who only see the next quarter, they're stoked.
For those who understand infinite growth is impossible and would simply like a livable world, they're horrified.
The code in question is not something that anyone needs to own. Rather, it's what anyone would write, faced with the same problem. It's stupid to make humans do a robot's job in the name of preserving meaningless "IP rights".
The repo: https://github.com/Shreeyak/cleargrasp
https://github.com/Shreeyak/cleargrasp/blob/master/api/depth...
It looks like the license of the repo is Apache 2.0
Please don’t straw man¹. That’s neither what I said, nor what intended to convey, nor what I believe.
Also, register your code with the copyright office.
Edit: Apparently, with the #1 post on HN right now, you could also just go here: https://githubcopilotinvestigation.com/
This specific one would not be a problem, but doing it with a still copyrighted work would be.
Gambling - I don't do it, but I'd need more specifics to see why gambling is bad in this sense. It's a voluntary pursuit that I think is a bad idea, but that doesn't make it illegal.
Price gouging is still being useful, just at a higher price. Someone could charge me £10 for bread and if that was the cheapest bread available, I'd buy it. If it is excessive and for essential goods, it is increasingly illegal, however. 42 out of 50 states in the US have anti-gouging laws [0], which, as I say, isn't what I'm talking about. I'm talking about legal things.
Underpaying workers - this certainly isn't illegal, unless it's below minimum wage, but also "underpaying" is an arbitrary term. If there's a regulatory/legal/corrupt state environment in which it's hard to create competitors to established businesses, then that's bad because it drives wages down. Otherwise, wages are set by what both the worker and employer sides will bear. And, lest we forget, there is still money coming into the business by it being useful. Customers are paying it for something. The fact that it might make less profit by paying more doesn't undermine that fundamental fact.
As for supporting laws to undermine competitors, that is something people can do, yes. Microsoft, after their app store went nowhere, came out against Apple and Google charging 30% for apps. Probably more of a PR move than a legal one, but businesses trying to influence laws isn't bad, because they have a valid perspective on the world just as we all do, unless it's corruption. Which is (once more, with feeling) illegal, and so out of scope of my comment. And again, unless the laws are there to establish a monopoly incumbent, which is pretty rare, and definitely the fault of the government that passes the laws, the company is still only really in existence because it does something useful enough to its customers that they pay it money.