This is similar to the ruling by Alsup in the Anthropic books case that the training is “exceedingly transformative”. I would expect a reinterpretation or disagreement on this front from another case to be both problematic and likely eventually overturned.
I don’t actually think provenance is a problem on the axis you suggest if Alsups ruling holds. That said I don’t think that’s the only copyright issue afoot - the copyright office writing on copyrightability of outputs from the machine essentially requires that the output fails the Feist tests for human copyrightability.
More interesting to me is how this might realign the notion of copyrightability of human works further as time goes on, moving from every trivial derivative bit of trash potentially being copyrightable to some stronger notion of, to follow the feist test, independence and creativity. Further it raises a fairly immediate question in an open source setting if many individual small patch contributions themselves actually even pass those tests - they may well not, although the general guidance is to set the bar low - but is a typo fix either? There is so far to go on this rabbit hole.
On that note, I am not sure why creators in so many industries are sitting around while they are being more or less ripped off by massive corporations, when music has got it right.
— Do you want to make a cover song? Go ahead. You can even copyright it! The original composer still gets paid.
— Do you want to make a transformative derivative work (change the composition, really alter the style, edit the lyrics)? Go ahead, just damn better make sure you license it first. …and you can copyright your derivative work, too. …and the original composer still gets credit in your copyright.
The current wave of LLM-induced AI hype really made the tech crowd bend itself in knots trying to paint this as an unsolvable problem that requires IP abuse, or not a problem because it’s all mostly “derivative bits of trash” (at least the bits they don’t like, anyway), argue in courts how it’s transformative, etc., while the most straightforward solution keeps staring them in the face. The only problem is that this solution does not scale, and if there’s anything the industry in which “Do Things That Don’t Scale” is the title of a hit essay hates then that would be doing things that don’t scale.
[0] It should be clarified that if art is considered (as I do) fundamentally a mechanism of self-expression then there is, of course, no trash and the whole point is moot.
"novel" here depends on what you mean. Could an LLM produce output that is unique that both it and no one else has seen before, possibly yes. Could that output have perceived or emotional value to people, sure. Related challenge: Is a random encryption key generated by a csprng novel?
In the case of the US copyright office, if there wasn't sufficient human involvement in the production then the output is not copyrightable and how "novel" it is does not matter - but that doesn't necessarily impact a prior production by a human that is (whether a copy or not). Novel also only matters in a subset of the many fractured areas of copyright laws affecting the space of this form of digital replication. The copyright office wrote: https://www.copyright.gov/ai/Copyright-and-Artificial-Intell....
Where I imagine this approximately ends up is some set of tests that are oriented around how relevant to the whole the "copy" is, that is, it may not matter whether the method of production involved "copying", but may more matter if the whole works in which it is included are at large a copy, or, if the area contested as a copy, if it could be replaced with something novel, and it is a small enough piece of the whole, then it may not be able to meet some bar of material value to the whole to be relevant - that there is no harmful infringement, or similarly could cross into some notion of fair use.
I don't see much sanity in a world where small snippets become an issue. I think if models were regularly producing thousands of tokens of exactly duplicate content that's probably an issue.
I've not seen evidence of the latter outside of research that very deliberately performs active search for high probability cases (such as building suffix tree indices over training sets then searching for outputs based on guidance from the index). That's very different from arbitrary work prompts doing the same, and the models have various defensive trainings and wrappings attempting to further minimize reproductive behavior. On the one hand you have research metrics like 3.6 bits per parameter of recoverable input, on the other hand that represents a very small slice of the training set, and many such reproductions requiring strongly crafted and long prompts - meaning that for arbitrary real world interaction the chance of large scale overlap is small.
I swear the ML community is able to rapidly change their mind as to whether "training" an AI is comparable to human cognition based on whichever one is beneficial to them at any given instant.
It's not art. It's parasitism of art.
We don't need all this (seemingly pretty good) analysis. We already know what everyone thinks: no relevant AI company has had their codebase or other IP scraped by AI bots they don't control, and there's no way they'd allow that to happen, because they don't want an AI bot they don't control to reproduce their IP without constraint. But they'll turn right around and be like, "for the sake of the future, we have to ingest all data... except no one can ingest our data, of course". :rolleyes:
There is no such thing as a “royalty free cover”. Either it is a full on faithful cover, which you can perform as long as license fees are paid, and in which case both the performer and the original songwriter get royalties, or it is a “transformative cover” which requires negotiation with the publisher/rights owner (and in that case IP ownership will probably be split between songwriter and performer depending on their agreement).
(Not an IP lawyer myself so someone can correct me.)
Furthermore, in countries where I know how it works as a venue owner you pay the rights organization a fixed sum per month or year and you are good to go and play any track you want. It thus makes no difference to you whether you play the original or a cover.
Have you considered that it is simply singers-performers who like to sing and would like to earn a bit of money from it, but don’t have many original songs if their own?
> It's parasitism of art
If we assume covers are parasitism of art, by that logic would your comment, which is very similar to dozens I have seen on this topic in recent months, be parasitism of discourse?
Jokes aside, a significant number of covers I have heard at cafes over years are actually quite decent, and I would certainly not call that parasitic in any way.
Even pretending they were, if you compare between artists specialising in covers and big tech trying to expropriate IP, insert itself as a middleman and arbiter for information access, devalue art for profit, etc., I am not sure they are even close in terms of the scale of parasitism.
1. The lyrics
2. The composition
3. The recording
These can all be owned by different people or the same person. The "royalty free covers" you mention are people abusing the rights of one of those. They're not avoiding royalties, they just havn't been caught yet.
Or, maybe you start to pay attention?
They are selling their songs cheaper for TV, radio or ads.
> Even pretending they were, if you compare between artists specialising in covers and big tech trying to expropriate IP
They're literally working for spotify.
I guess that somehow refutes the points I made, I just can’t see how.
Radio stations, like the aforementioned venue owners, pay the rights organizations a flat annual fee. TV programs do need to license these songs (as unlike simple cover here the use is substantially transformative), but again: 1) it does not rip off songwriters (holder of songwriter rights for a song gets royalties for performance of its covers, songwriter has a say in any such licensing agreement), and 2) often a cover is a specifically considered and selected choice: it can be miles better fitting for a scene than the original (just remember Motion Picture Soundtrack in that Westworld scene), and unlike the original it does not tend to make the scene all about itself so much. It feels like you are yet to demonstrate how it is particularly parasitic.
Edit: I mean honest covers; modifying a song a little bit and passing it as original should be very sueable by the rights holder and I would be very surprised if Spotify decided to do that even if they fired their entire legal department and replaced it with one LLM chatbot.
Regarding what you described, I don’t think I encountered this in the wild enough to remember. IANAL but if not cleared/registered properly as a cover it doesn’t seem to be a workaround or abuse, but would probably be found straight up illegal if the rights holder or relevant rights organization cares to sue. In this case, all I can say is “yes, some people do illegal stuff”. The system largely works.
A restaurant / cafe may pay a fixed fee and get access to a specific catalog of songs (performances). The fee depends on what the catalog contains. As you can imagine, paying for the right to only play instrumental versions of songs (no singers, no lyrics) is significantly cheaper. Or, having performances of songs by unknown people.