So don’t beat yourself up please.
When I worked for “SaaS unicorn” we typically had multiple levels of escalation, and acknowledging would have done nothing because the alarm would continue firing until fixed. Not sure what’s changed in 15 years of ops, I had assumed it would be better now- I can’t imagine silencing an alert totally by acknowledging it- if its still occurring.
I’m totally fine with how you handled it, if anything I am thankful. But that seems to be a system I would improve if I had the time.
“mute” is different than “resolve” to me, and both should exist. (Where mute is an acknowledgement of an issue as ongoing.)
(Might be wise though to have PagerDuty configured to re-alert if the outage persists.)
I hope it doesn't change (much).
Not to say that I don't procrastinate or waste time doing other nonsense. I can definitely spend a lot of time reading HN comments, as I'm doing right now.
Anyway,anyone who finds themselves with a problem with HN should try that out :)
I'm pretty happy with how it's developing—the trendline is promising—but not ready to rely on it in prod yet.
To be clear, I wasn’t complaining. Just pointing it out. Aside from any more speculative benefit to YC for running the site, the site does run outright ads.
Apologies for the misunderstanding
I did miss exactly what you meant by “problem” in that passage, but get it now, so thanks for that.