zlacker

[parent] [thread] 0 comments
1. nijave+(OP)[view] [source] 2023-06-29 23:11:56
You quickly start to get into "what does down mean?" conversations. When you have a bunch of geographical locations and thousands of different systems/functionalities, it's not always clear if something is down.

Take a service responding 1% of the time with errors. Probably not "down". What about 10%? Probably not. What about 50%? Maybe, hard to say.

Maybe there's a fiber cut in rural village effecting 100% of your customers there but only 0.0001% of total customers?

Sure there's cases like this where everything is hosed but it sort of begs the question "is building a complex monitoring system for <some small number of downtimes a year>" actually worth it?

[go to top]