zlacker

[parent] [thread] 14 comments
1. crater+(OP)[view] [source] 2023-12-27 16:33:00
"I think that AI is always plagiarism," he [Bruce Perens] said
replies(3): >>strong+y4 >>Charle+s7 >>Fergus+lf
2. strong+y4[view] [source] 2023-12-27 16:57:07
>>crater+(OP)
NYT seems to agree: >>38781941
replies(1): >>gumbal+Yb
3. Charle+s7[view] [source] 2023-12-27 17:14:00
>>crater+(OP)
> ”I think that AI is always plagiarism," he says. "When you train the model, you're training the model with other people's copyrighted stuff.”

This is certainly true of OpenAI and Google. But Adobe Firefly is trained on only on their own Adobe Stock images, licensed content, and public domain content.

replies(3): >>cycoma+Iy >>Captai+BP >>Quantu+Cy1
◧◩
4. gumbal+Yb[view] [source] [discussion] 2023-12-27 17:39:03
>>strong+y4
And a lot of people who’s work is stolen.
5. Fergus+lf[view] [source] 2023-12-27 17:57:09
>>crater+(OP)
At some point (maybe not yet) It's hard to say convincingly that what AI outputs is plagiarism but human output isn't. Not because AI is conscious or whatever, because it never outputs exactly what was in the training data and like humans combines everything they know to solve the issue in front of them
replies(2): >>kiba+ij >>Xelyne+WE1
◧◩
6. kiba+ij[view] [source] [discussion] 2023-12-27 18:18:10
>>Fergus+lf
It sounds like we should just make all these AI open source and freely available to prevent any single individuals or corporations from monopolizing the profit off of it.
◧◩
7. cycoma+Iy[view] [source] [discussion] 2023-12-27 19:44:13
>>Charle+s7
I just had a look at the Adobe stock licencing and royalty terms and I'm not 100% clear that they are bulletproof either. They certainly don't mention AI training in their terms (I guess they argue that this would fall under the "developing new features" part of the clause).

It does seem however seem a significant change of the terms, as Adobe was essentially paying contributors based on their downloads (i.e. how often their work was used), but the whole AI model is they use the work to not having to pay the contributors anymore.

◧◩
8. Captai+BP[view] [source] [discussion] 2023-12-27 21:12:47
>>Charle+s7
I think this entire thing is nonsensical in the first place. Plagiarism is not related to copyright at all, it's related to credit and attribution. I can plagiarise something in the public domain, for example:

    Happy birthday to you
    Happy birthday to you
    Happy birthday to [NAME]
    Happy birthday to you!
    (Written by me.)
In any case, I heavily disagree with Bruce. The whole point of the free culture movement is reusing and remixing previous works, and AI is the ultimate remixer.

Why Software Should Be Free made a case against copyright as well. It's quite disappointing to see open source miss the point of free software once again.

replies(2): >>pests+Ei1 >>Xelyne+BE1
◧◩◪
9. pests+Ei1[view] [source] [discussion] 2023-12-28 00:35:00
>>Captai+BP
Free culture remixing and reusing is fine with me, when it's done creatively by people. When a company systematically automates and productizes the concept it becomes an issue.

I don't mind if another artist paints with my brush...

I wouldn't like a factory set up producing works with my brush en masse.

◧◩
10. Quantu+Cy1[view] [source] [discussion] 2023-12-28 03:13:44
>>Charle+s7
This is just a batshit insane thing for a person who supposedly promotes the creative commons to say. Plagiarism isn't learning how to write articles like the NYT or how to write code like Linus or RMS. That's still the case if I program my computer to do it.
replies(1): >>Xelyne+pD1
◧◩◪
11. Xelyne+pD1[view] [source] [discussion] 2023-12-28 04:00:01
>>Quantu+Cy1
I think it makes sense. Plagiarism is literally taking someones work/ideas and passing them off as my own. If I were to read 100 NYT articles(one of which is about some event) and then write my own article afterwards that uses similar details of the event or is laid out in similar ways, that's plagiarism.

When you program a machine to do the same thing but 1000x more efficient, it's still plagiarism.

replies(1): >>Quantu+uQz
◧◩◪
12. Xelyne+BE1[view] [source] [discussion] 2023-12-28 04:11:09
>>Captai+BP
> Plagiarism is not related to copyright at all. The whole point of the free culture movement is reusing and remixing previous works. I can plagiarize something in the public domain

I'd half-agree, but I don't think "breaking copyright" matters to the question of "is LXM 'AI' plagiarism?".

Like you say you can plagiarize without braking copyright(for cases where the copyright allows usage without attribution such as with public domain), and it's also possible to break copyright without plagiarism(e.x. redistributing with attribution when you don't have the license).

But I think this is irrelevant to the point being made. LXM's need to take in a large amount of data, and then the outputs are attributed to the "model" rather than the originators of the material.

Since most of the content being digested by LXMs is not public domain that's where copyright gets twisted up with it, since for the majority of LLM training data 'plagiarism' and 'breaking copyright' come from the same act of redistributing/using without attribution(and since the "LXM" is considered to have created the data by most people the 'plagiarism' comes in).

replies(1): >>Captai+HZ1
◧◩
13. Xelyne+WE1[view] [source] [discussion] 2023-12-28 04:14:47
>>Fergus+lf
Can't the same argument be used to say "lossy compression is not plagiarism".

If I encode a movie with H264 there is no way to get it to output "exactly what was in the training data" and I can argue that "like humans extract important information from large dumps of data, the algorithm does the same".

I don't have any reservations about calling an H264 encoded video redistributed with the wrong attribution "plagiarism", so I don't see what's different about Large X Models that they deserve a special pass.

◧◩◪◨
14. Captai+HZ1[view] [source] [discussion] 2023-12-28 08:26:38
>>Xelyne+BE1
That's a good point. I'm not sure how attribution should even be done in this situation though, considering that we have millions (billions?) of sources. A mega attribution file, maybe?

As a creator I feel like that's not very useful, to be a single name in billions. Of course I'd still like attribution if the work was significantly based on mine.

◧◩◪◨
15. Quantu+uQz[view] [source] [discussion] 2024-01-09 02:17:11
>>Xelyne+pD1
Outside an academic setting plagiarism is just called "thought".
[go to top]