zlacker

[parent] [thread] 16 comments
1. philis+(OP)[view] [source] 2023-11-20 01:59:20
That does not sound like the proper way to do an openAI 2.0. If Reddit ever hears that's how an AI company scraped them, they'll get sued for fun and profits.
replies(4): >>az226+N >>4death+D1 >>wahnfr+32 >>Shekel+s9
2. az226+N[view] [source] 2023-11-20 02:06:52
>>philis+(OP)
You can legally scrape anything that does not require a login in the US. You can also legally train an AI on it for now.
replies(1): >>ejstro+Z3
3. 4death+D1[view] [source] 2023-11-20 02:13:14
>>philis+(OP)
The point is that the data is easily accessible. If you wanted to get your hands on the data while simultaneously keeping them clean, contract with a Russian contracting company to give you a data dump. You don't need to know how they got it.
replies(2): >>twoodf+F2 >>mr_toa+i8
4. wahnfr+32[view] [source] 2023-11-20 02:15:42
>>philis+(OP)
you're aware openai trained on a boatload of pirated ebooks?

they "steal" access to data because the LLM launders it on the other end

replies(2): >>philis+I3 >>bko+f7
◧◩
5. twoodf+F2[view] [source] [discussion] 2023-11-20 02:20:46
>>4death+D1
Well, until discovery, wherein your deliberate not knowing will be a pretty big deal.
◧◩
6. philis+I3[view] [source] [discussion] 2023-11-20 02:29:12
>>wahnfr+32
That is frustrating to no end. If I pirate one book I should pay a hefty fine. If a company does it it's unlocking untapped value.
◧◩
7. ejstro+Z3[view] [source] [discussion] 2023-11-20 02:30:24
>>az226+N
Are you referring to the LinkedIn case? There has not been a decision on the legality of scraping in that matter
◧◩
8. bko+f7[view] [source] [discussion] 2023-11-20 02:54:57
>>wahnfr+32
What do you base this on?

Llms know the contents of books because they are analyzed, reviewed and spoken about everywhere. Pick some obscure book that doesn't show up on any social media and ask about it's contents. GPT won't have a clue

replies(1): >>wahnfr+A9
◧◩
9. mr_toa+i8[view] [source] [discussion] 2023-11-20 03:01:19
>>4death+D1
Subcontracting out your crimes isn’t going to fly in court.
replies(2): >>4death+Rg >>flir+ch
10. Shekel+s9[view] [source] 2023-11-20 03:10:31
>>philis+(OP)
It's essentially impossible to prove in court that training data was obtained or used improperly unless you go and tell on yourself. And even then it requires you to actually make someone with a lot of money mad, or to not have enough money yourself. Certainly microsoft would have already caught lots of flak for training their models on every github repo, instead they got a minor paddling from the public eye that went away after not much time had passed.
replies(1): >>mongol+Jh
◧◩◪
11. wahnfr+A9[view] [source] [discussion] 2023-11-20 03:12:47
>>bko+f7
https://qz.com/openai-books-piracy-microsoft-meta-google-cha....

What's your evidence contrary to this? Sounds like your common sense rather than inside knowledge

replies(1): >>bko+Jw2
◧◩◪
12. 4death+Rg[view] [source] [discussion] 2023-11-20 04:30:30
>>mr_toa+i8
Really? It's done pretty regularly to limit liability.
replies(1): >>Nasrud+pz
◧◩◪
13. flir+ch[view] [source] [discussion] 2023-11-20 04:33:23
>>mr_toa+i8
If it's done in a country where it's legal, maybe even processed in the same country and all you take out is the weights, I bet it gets a bit muddier.
◧◩
14. mongol+Jh[view] [source] [discussion] 2023-11-20 04:39:08
>>Shekel+s9
It is not impossible. You can call witnesses, refer to emails, source code etc.
◧◩◪◨
15. Nasrud+pz[view] [source] [discussion] 2023-11-20 06:33:06
>>4death+Rg
They make a point out of not directly asking for the crime when they do that. Just increasing pressure on subcontractors that leads to cutting corners including the law.

It is harder to prove to a "should have known" standard compared to say buying stolen speakers from the back of a truck for 20% of the list price.

replies(1): >>4death+ry2
◧◩◪◨
16. bko+Jw2[view] [source] [discussion] 2023-11-20 17:21:46
>>wahnfr+A9
Did you read the article (this one misstates the case but if you look at the one linked about the lawsuit)? This is a lawsuit. Nothing has been proven. Burden of proof is on you
◧◩◪◨⬒
17. 4death+ry2[view] [source] [discussion] 2023-11-20 17:26:47
>>Nasrud+pz
There’s an implicit assumption in your argument that you’re going to directly ask for a crime to be committed. Why are you assuming that? You’ll go to a contractor and say “we want Reddit data.” Anyone with even mild technical competence can figure out how to get it.
[go to top]