zlacker

[parent] [thread] 7 comments
1. Terr_+(OP)[view] [source] 2023-06-14 01:13:55
> It's so easy to think of other ways this could have been handled.

Well, assuming Reddit executives were telling the truth about their goals/needs, which I don't think they are.

They claim the purpose was some kind of emergency band-aid to stop the service from hemorrhaging cash from evil large-scale data-sucking AI developers without compensation... But in that case, they could have simply introduced it as a fresh terms-of-service restriction, with some payment-tier to come later that permits that use of the data.

replies(2): >>jprete+h7 >>lost_t+xf
2. jprete+h7[view] [source] 2023-06-14 02:03:26
>>Terr_+(OP)
I don't think Reddit was telling the truth either.

But AI developers/companies seem to almost universally believe they have fair-use rights to train their models on any data they can get their hands on, and a sufficiently expensive API at least forces them to do engineering work to get all the data. So at the time I believed Reddit's reasoning.

replies(2): >>tetris+ba >>johnny+DW1
◧◩
3. tetris+ba[view] [source] [discussion] 2023-06-14 02:26:06
>>jprete+h7
Pushshift was the best unofficial reddit API for years, and did so without any funding problems
replies(1): >>jprete+mb
◧◩◪
4. jprete+mb[view] [source] [discussion] 2023-06-14 02:35:23
>>tetris+ba
I'm talking about deterring ML data harvesting, or at least getting a better price for it.
replies(1): >>Semaph+9f
◧◩◪◨
5. Semaph+9f[view] [source] [discussion] 2023-06-14 03:11:02
>>jprete+mb
If it were about ML, why did they go heavily after third-party apps, even making up lies about one of the developers?
replies(1): >>jprete+ba1
6. lost_t+xf[view] [source] 2023-06-14 03:14:32
>>Terr_+(OP)
They could have set up a signature with 3rd party apps, those apps could have have charged the apps a couple bucks/required ads in the feeds from reddit/etc. Instead they wanted 100% control and 100% of the data. This isn't -just- about killing off AI scrapers, this is about sending a message that reddit is 100% under the CEO's control for when they have the IPO. Wallstreet does not like "free" stuff, all they see are cost centers and profit centers.
◧◩◪◨⬒
7. jprete+ba1[view] [source] [discussion] 2023-06-14 11:55:44
>>Semaph+9f
Because it wasn't about ML. As I said, I believe Reddit was lying. But the ML story was at least plausible at the time.
◧◩
8. johnny+DW1[view] [source] [discussion] 2023-06-14 15:32:59
>>jprete+h7
>But AI developers/companies seem to almost universally believe they have fair-use rights to train their models on any data they can get their hands on

Google and many other companies for almost 2 decades have spent their time scraping petabytes of data from the web. A lot of that with no expectation of payment. Some companies became billionaires off of that ability to freely access mass bulks of data.

Data scraping has always been a grey area, but I find it strange how it's suddenly taken a turn for some people whenever modern AI comes up. We can't really be drawing lines based on what we feel is good/evil, because we will never agree as a whole on what is good/evil.

[go to top]