I hear all of the cost savings benefit, but I never see the team factoring in their own time (and others time) needed to set up and maintain these systems reliably long term.
Something IC’s at company often struggle to understand is the reason why companies often prefer to buy managed solutions even when “free” alternatives exist (read: the free alternatives are also expensive, just a different type of cost)
No one, you pull an engineer off the production issue to debug the log server, because you need the log server to debug the production servers.
See the problem?
Edit: to be clear I’m no fan of Datadog and I wish self hosting were an option. I want this path for our company, but at least on our team we just don’t have enough (redundant) expertise to deploy and manage these systems. We’d have to hire an extra FTE.
If you mean you are experiencing two totally unrelated issues at the same time, then I don’t think that’s a reasonable thing to really assign much value to as it’s incredibly unlikely.
Half of $30k/mo trivially pays for an engineer you hire to only manage such a cluster for you and just works an hour a week unless a pager goes off if you truly need that level of peace of mind. If you’re hiring for such a position I have a few rock star level folks who would love such a job.
The hypothetical problems people imagine for on-prem infrastructure get really strange to me. I could come up with the same sort of scenarios for cloud based SaaS infrastructure just as easily.
In my experience the systems/tools needed to debug production issues are often only used when they’re needed.
Which now means you need health and uptime monitoring on your log server since without that, it might break randomly and no one notices until you need it.
> The hypothetical problems people imagine for on-prem infrastructure get really strange to me
It really comes down to the people and whether you have the expertise on the team. And whether the team can realistically manage the system long term. It’s typically safer to spend more money for the managed service.
(It’s a safer decision, not necessarily better)