zlacker

[parent] [thread] 19 comments
1. 4death+(OP)[view] [source] 2023-11-20 01:55:57
Assuming the Reddit app does not use certificate pinning, you can use your computer to provide internet to your phone and then use an app like Charles Proxy to inspect requests being made from an app. Pretty easy to reverse engineer the API.

If the app does use certificate pinning, then you can use an Android phone and a modified app that removes the logic that enforces certificate pinning. This is more involved but also not impossible.

replies(3): >>philis+s >>patcon+9d >>gumbal+Wa1
2. philis+s[view] [source] 2023-11-20 01:59:20
>>4death+(OP)
That does not sound like the proper way to do an openAI 2.0. If Reddit ever hears that's how an AI company scraped them, they'll get sued for fun and profits.
replies(4): >>az226+f1 >>4death+52 >>wahnfr+v2 >>Shekel+U9
◧◩
3. az226+f1[view] [source] [discussion] 2023-11-20 02:06:52
>>philis+s
You can legally scrape anything that does not require a login in the US. You can also legally train an AI on it for now.
replies(1): >>ejstro+r4
◧◩
4. 4death+52[view] [source] [discussion] 2023-11-20 02:13:14
>>philis+s
The point is that the data is easily accessible. If you wanted to get your hands on the data while simultaneously keeping them clean, contract with a Russian contracting company to give you a data dump. You don't need to know how they got it.
replies(2): >>twoodf+73 >>mr_toa+K8
◧◩
5. wahnfr+v2[view] [source] [discussion] 2023-11-20 02:15:42
>>philis+s
you're aware openai trained on a boatload of pirated ebooks?

they "steal" access to data because the LLM launders it on the other end

replies(2): >>philis+a4 >>bko+H7
◧◩◪
6. twoodf+73[view] [source] [discussion] 2023-11-20 02:20:46
>>4death+52
Well, until discovery, wherein your deliberate not knowing will be a pretty big deal.
◧◩◪
7. philis+a4[view] [source] [discussion] 2023-11-20 02:29:12
>>wahnfr+v2
That is frustrating to no end. If I pirate one book I should pay a hefty fine. If a company does it it's unlocking untapped value.
◧◩◪
8. ejstro+r4[view] [source] [discussion] 2023-11-20 02:30:24
>>az226+f1
Are you referring to the LinkedIn case? There has not been a decision on the legality of scraping in that matter
◧◩◪
9. bko+H7[view] [source] [discussion] 2023-11-20 02:54:57
>>wahnfr+v2
What do you base this on?

Llms know the contents of books because they are analyzed, reviewed and spoken about everywhere. Pick some obscure book that doesn't show up on any social media and ask about it's contents. GPT won't have a clue

replies(1): >>wahnfr+2a
◧◩◪
10. mr_toa+K8[view] [source] [discussion] 2023-11-20 03:01:19
>>4death+52
Subcontracting out your crimes isn’t going to fly in court.
replies(2): >>4death+jh >>flir+Eh
◧◩
11. Shekel+U9[view] [source] [discussion] 2023-11-20 03:10:31
>>philis+s
It's essentially impossible to prove in court that training data was obtained or used improperly unless you go and tell on yourself. And even then it requires you to actually make someone with a lot of money mad, or to not have enough money yourself. Certainly microsoft would have already caught lots of flak for training their models on every github repo, instead they got a minor paddling from the public eye that went away after not much time had passed.
replies(1): >>mongol+bi
◧◩◪◨
12. wahnfr+2a[view] [source] [discussion] 2023-11-20 03:12:47
>>bko+H7
https://qz.com/openai-books-piracy-microsoft-meta-google-cha....

What's your evidence contrary to this? Sounds like your common sense rather than inside knowledge

replies(1): >>bko+bx2
13. patcon+9d[view] [source] 2023-11-20 03:41:44
>>4death+(OP)
Yeah! <3 https://github.com/mitmproxy/android-unpinner
◧◩◪◨
14. 4death+jh[view] [source] [discussion] 2023-11-20 04:30:30
>>mr_toa+K8
Really? It's done pretty regularly to limit liability.
replies(1): >>Nasrud+Rz
◧◩◪◨
15. flir+Eh[view] [source] [discussion] 2023-11-20 04:33:23
>>mr_toa+K8
If it's done in a country where it's legal, maybe even processed in the same country and all you take out is the weights, I bet it gets a bit muddier.
◧◩◪
16. mongol+bi[view] [source] [discussion] 2023-11-20 04:39:08
>>Shekel+U9
It is not impossible. You can call witnesses, refer to emails, source code etc.
◧◩◪◨⬒
17. Nasrud+Rz[view] [source] [discussion] 2023-11-20 06:33:06
>>4death+jh
They make a point out of not directly asking for the crime when they do that. Just increasing pressure on subcontractors that leads to cutting corners including the law.

It is harder to prove to a "should have known" standard compared to say buying stolen speakers from the back of a truck for 20% of the list price.

replies(1): >>4death+Ty2
18. gumbal+Wa1[view] [source] 2023-11-20 10:01:37
>>4death+(OP)
Why y’all desperate to steal data to train non intelligent software? Reddit and others should sue for license violations.
◧◩◪◨⬒
19. bko+bx2[view] [source] [discussion] 2023-11-20 17:21:46
>>wahnfr+2a
Did you read the article (this one misstates the case but if you look at the one linked about the lawsuit)? This is a lawsuit. Nothing has been proven. Burden of proof is on you
◧◩◪◨⬒⬓
20. 4death+Ty2[view] [source] [discussion] 2023-11-20 17:26:47
>>Nasrud+Rz
There’s an implicit assumption in your argument that you’re going to directly ask for a crime to be committed. Why are you assuming that? You’ll go to a contractor and say “we want Reddit data.” Anyone with even mild technical competence can figure out how to get it.
[go to top]