zlacker

I don't think Reddit was telling the truth either.

But AI developers/companies seem to almost universally believe they have fair-use rights to train their models on any data they can get their hands on, and a sufficiently expensive API at least forces them to do engineering work to get all the data. So at the time I believed Reddit's reasoning.

replies(2): >>tetris+U2 >>johnny+mP1

>>jprete+(OP)
Pushshift was the best unofficial reddit API for years, and did so without any funding problems

replies(1): >>jprete+54

>>tetris+U2
I'm talking about deterring ML data harvesting, or at least getting a better price for it.

replies(1): >>Semaph+S7

>>jprete+54
If it were about ML, why did they go heavily after third-party apps, even making up lies about one of the developers?

replies(1): >>jprete+U21

>>Semaph+S7
Because it wasn't about ML. As I said, I believe Reddit was lying. But the ML story was at least plausible at the time.

>>jprete+(OP)
>But AI developers/companies seem to almost universally believe they have fair-use rights to train their models on any data they can get their hands on

Google and many other companies for almost 2 decades have spent their time scraping petabytes of data from the web. A lot of that with no expectation of payment. Some companies became billionaires off of that ability to freely access mass bulks of data.

Data scraping has always been a grey area, but I find it strange how it's suddenly taken a turn for some people whenever modern AI comes up. We can't really be drawing lines based on what we feel is good/evil, because we will never agree as a whole on what is good/evil.