zlacker

why are you saying "memorize"? are people training AIs to regurgitate exact copies? if so, that's just copying. if they return something that is not a literal copy of the whole work, then there is established caselaw about how much is permitted. some clearly is, but not entire works.

when you buy a book, you are not acceding to a license to only ever read it with human eyes, forbearing to memorize it, never to quote it, never to be inspired by it.

replies(3): >>mwarke+R1 >>pyman+D3 >>belorn+6M

>>markha+(OP)
> Specifically, the paper estimates that Llama 3.1 70B has memorized 42 percent of the first Harry Potter book well enough to reproduce 50-token excerpts at least half the time. (I’ll unpack how this was measured in the next section.)

> Interestingly, Llama 1 65B, a similar-sized model released in February 2023, had memorized only 4.4 percent of Harry Potter and the Sorcerer's Stone. This suggests that despite the potential legal liability, Meta did not do much to prevent memorization as it trained Llama 3. At least for this book, the problem got much worse between Llama 1 and Llama 3.

> Harry Potter and the Sorcerer's Stone was one of dozens of books tested by the researchers. They found that Llama 3.1 70B was far more likely to reproduce popular books—such as The Hobbit and George Orwell’s 1984—than obscure ones. And for most books, Llama 3.1 70B memorized more than any of the other models.

>>markha+(OP)
You are comparing AI to humans, but they're not the same. Humans don't memorise millions of copyrighted work and spit out similar content. AI does that.

Memorising isn't wrong but when machines memorise at scale and the people behind the original work get nothing, it raises big ethical questions.

The law hasn't caught up.

replies(1): >>bongod+Qb

>>pyman+D3
As a former musician, yes, we do. Any above average musician can play "Riders on the Storm" in the style of Johnny Cash, or Green Day, or Nirvana, etc. Successful above average musicians usually have almost encyclopedic knowledge of artists and albums at least in their favorite genre. This is how all art is made. Some artists will be more honest about this than others.

replies(2): >>pyman+qs >>immibi+Pf1

>>bongod+Qb
Again, you are comparing machines with humans. We're built for depth, not scale. Machines are built for scale, not depth.

I also play the guitar, and it took me 10 years to learn 30 or 40 songs. So I don't see how anyone can learn 7 million songs in a couple of minutes.

replies(1): >>bongod+xy

>>pyman+qs
I have learned 100s of songs in a summer for various fill in gigs. Most music is extremely similar. You don't need to learn every song in existence to write suno pop.

replies(1): >>pyman+q41

>>markha+(OP)
The wast majority of piracy are not literal copies. Movies and music get constantly transformed into different sizes and scales, with the majority using lossy transformations that changes the work. A movie taken as raw format and transformed into 144p has far less than 1% of the original work, and is barely recognizable. Copyright law seems to recognize that as infringement.

Most AI seems much better at reproducing a semi-identical copies of an original work than existing video/audio encoders.

>>bongod+xy
Impressive. I rehearsed for a month before a gig where I played 12 songs. So, unfortunately, I can't relate.

>>bongod+Qb
And those bands can successfully sue you for that. Especially if you sell it for money. Double especially if your sales of their songs displace them in the market.