zlacker

[parent] [thread] 5 comments
1. jprete+(OP)[view] [source] 2023-06-14 02:03:26
I don't think Reddit was telling the truth either.

But AI developers/companies seem to almost universally believe they have fair-use rights to train their models on any data they can get their hands on, and a sufficiently expensive API at least forces them to do engineering work to get all the data. So at the time I believed Reddit's reasoning.

replies(2): >>tetris+U2 >>johnny+mP1
2. tetris+U2[view] [source] 2023-06-14 02:26:06
>>jprete+(OP)
Pushshift was the best unofficial reddit API for years, and did so without any funding problems
replies(1): >>jprete+54
◧◩
3. jprete+54[view] [source] [discussion] 2023-06-14 02:35:23
>>tetris+U2
I'm talking about deterring ML data harvesting, or at least getting a better price for it.
replies(1): >>Semaph+S7
◧◩◪
4. Semaph+S7[view] [source] [discussion] 2023-06-14 03:11:02
>>jprete+54
If it were about ML, why did they go heavily after third-party apps, even making up lies about one of the developers?
replies(1): >>jprete+U21
◧◩◪◨
5. jprete+U21[view] [source] [discussion] 2023-06-14 11:55:44
>>Semaph+S7
Because it wasn't about ML. As I said, I believe Reddit was lying. But the ML story was at least plausible at the time.
6. johnny+mP1[view] [source] 2023-06-14 15:32:59
>>jprete+(OP)
>But AI developers/companies seem to almost universally believe they have fair-use rights to train their models on any data they can get their hands on

Google and many other companies for almost 2 decades have spent their time scraping petabytes of data from the web. A lot of that with no expectation of payment. Some companies became billionaires off of that ability to freely access mass bulks of data.

Data scraping has always been a grey area, but I find it strange how it's suddenly taken a turn for some people whenever modern AI comes up. We can't really be drawing lines based on what we feel is good/evil, because we will never agree as a whole on what is good/evil.

[go to top]