zlacker

[parent] [thread] 2 comments
1. fluidc+(OP)[view] [source] 2023-06-01 11:05:23
I think most people who wanted large datasets got their data via pushshift. Pushshift was basically a guy who started out doing small things got so frustrated with the API that he eventually grew to maintaining large mirrors of Reddit content on Google cloud that people could access and query. I don't know why anyone doing research would have used reddit's API instead of using pushshift.

Pushshift has been shutdown by reddit earlier this year, so probably they are getting hammered by LLM folks trying to get the data now since they killed pushshift without understanding how it fit into the universe.

Reddit is completely stupid if they think people are going to pay for "enterprise API" access... pushshift existed because the API was trash and the only real option is to dump the entire dataset into something usable. The reason reddit's data was used so much is because there was an SQL API via pushshift and you could also download archives of the entire dataset at one go.

replies(1): >>doglea+8o
2. doglea+8o[view] [source] 2023-06-01 13:58:58
>>fluidc+(OP)
> Pushshift has been shutdown by reddit earlier this year

Oh is this why all the comment undelete websites broke?

replies(1): >>redeye+mL
◧◩
3. redeye+mL[view] [source] [discussion] 2023-06-01 15:29:18
>>doglea+8o
Yep this is exactly why
[go to top]