zlacker

[parent] [thread] 8 comments
1. ameliu+(OP)[view] [source] 2023-05-31 21:56:47
Can't they pull the data from archive.org?
replies(3): >>notaco+a1 >>KuiN+r6 >>SllX+B9
2. notaco+a1[view] [source] 2023-05-31 22:04:17
>>ameliu+(OP)
That would be worse.
3. KuiN+r6[view] [source] 2023-05-31 22:34:11
>>ameliu+(OP)
Archive.org was knocked offline the other day due to some AI startup scraping it to death. It’s not a good thing.
replies(1): >>moneyw+Fx
4. SllX+B9[view] [source] 2023-05-31 22:53:00
>>ameliu+(OP)
Archive.org is a non-profit without the capacity to serve that many requests. An excellent resource for people to use carefully, but not a treasure trove for bots to scrape down to the last bit.
replies(1): >>notpus+si1
◧◩
5. moneyw+Fx[view] [source] [discussion] 2023-06-01 02:39:22
>>KuiN+r6
Source, they don’t rate limit
replies(3): >>Kon-Pe+gz >>pipers+zF >>edgyqu+5L
◧◩◪
6. Kon-Pe+gz[view] [source] [discussion] 2023-06-01 02:55:13
>>moneyw+Fx
https://news.ycombinator.com/item?id=36110527
◧◩◪
7. pipers+zF[view] [source] [discussion] 2023-06-01 04:13:23
>>moneyw+Fx
True - and their lack of rate limiting ended up letting someone overwhelm their servers, knocking them offline.
◧◩◪
8. edgyqu+5L[view] [source] [discussion] 2023-06-01 05:23:11
>>moneyw+Fx
They put out a blog asking people not to scrape afterwards. A simple google will be much fast than asking for sources.
◧◩
9. notpus+si1[view] [source] [discussion] 2023-06-01 11:51:18
>>SllX+B9
Would be cool if they introduce some reasonably priced access for mass scrapers. Should make some nice income in addition to donations, and a valuable service to community.
[go to top]