zlacker

[parent] [thread] 6 comments
1. mutant+(OP)[view] [source] 2023-08-15 05:10:09
> We probably banned it for submissions because we want original sources at the top level.

Then why web.archive.org isn't also banned? [1] And what about things which aren't available from the original source anymore?

[1]: >>37130420

replies(3): >>gouggo+u7 >>dang+v32 >>dredmo+K32
2. gouggo+u7[view] [source] 2023-08-15 06:34:18
>>mutant+(OP)
> Then why web.archive.org isn't also banned?

Because web.archive.org is generally used for...

... things which aren't available from the original source anymore.

While archive.is is generally used to bypass paywalls. These 2 websites have 2 very distinct missions and use-cases.

replies(1): >>dredmo+ek2
3. dang+v32[view] [source] 2023-08-15 19:37:50
>>mutant+(OP)
That's a good question. See https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... and dredmorbius's comment at >>37138346 re archive.org.

As for "why archive.org and not archive.is" - that's a bit of a borderline call, but gouggoug pointed out some of it at >>37130890 . The set of articles which (a) are no longer on the web, (b) are not on archive.org, but (c) are on archive.is, isn't that big. Paywall workarounds are a different thing, because the original URLs are still on the web (albeit paywalled). For those, we want the original URL at the top level, because it's important for the domain to appear beside the title.

replies(1): >>dredmo+O32
4. dredmo+K32[view] [source] 2023-08-15 19:39:35
>>mutant+(OP)
The Internet Archive is permitted when the original site or content is unavailable.

Otherwise, HN's rule is to "submit the original source": <https://news.ycombinator.com/newsguidelines.html>

I suppose that might be clarified as "most original or canonical", but Because Reasons HN's guidelines are written loosely and interpreted according to HN's Prime Directive: "anything that gratifies one's intellectual curiosity" <>>508153 >.

◧◩
5. dredmo+O32[view] [source] [discussion] 2023-08-15 19:40:12
>>dang+v32
You're apparently in the middle of editing this as I'm replying, but I suspect I'm close to the mark here: <>>37138346 >
replies(1): >>dang+z42
◧◩◪
6. dang+z42[view] [source] [discussion] 2023-08-15 19:43:11
>>dredmo+O32
Yup!
◧◩
7. dredmo+ek2[view] [source] [discussion] 2023-08-15 21:15:37
>>gouggo+u7
Whilst I agree with your characterisation as regards usage on HN, I will note that Archive Today actually is a quite useful archival tool, and often works on sites which the Internet Archive behaves poorly on.

I'd run across an instance of this when the Diaspora* pod I was on (the original public node, as it happens) ceased operations. I found myself wanting to archive my own posts, and was caught in something of a dilemma:

- The Internet Archive's Wayback Machine has a highly-scriptable method for submitting sites, in the form of a URL (see below). Once you have a list of pages you want to archive, you can chunk through those using your scripting tool of choice (for me, bash, and curl or wget typically). But it doesn't capture the comments on Diaspora* discussions.... E.g., <https://web.archive.org/web/20220111031247/https://joindiasp...>

- Archive.Today does not have a mass-submission tool, and somewhat aggressively imposes CAPTCHAs at times. So the remaining option is manual submissions, though those can be run off a pre-generated list of URLs which somewhat streamlines the process. And it does capture the comments. E.g., <https://archive.is/9t61g>

So, if you are looking to archive material, Archive Today is useful, if somewhat tedious at bulk.

(Which is probably why the Internet Archive is the far more comprehensive Web archive.)

[go to top]