Ran into an issue upgrading an AKS cluster last week. It completely stalled and broke the entire cluster in a way where our hands were tied as we can't see the control plane at all...
I submit a severity A ticket and 5 hours later I get told there was a known issue with the latest VM image that would create issues with the control plane leaving any cluster that was updated in that window to essentially kill itself and require manual intervention. Did they notify anyone? Nope, did they stop anyone from killing their own clusters. Nope.
It seems like every time I'm forced to touch the Azure environment I'm basically playing Russian roulette hoping that something's not broken on the backend.
That’s was years ago, wild to see they have the same issues.
Also had a few instance types which won't spin up in some regions/AZs recently. I assume this is capacity issues.
There’s a bunch of hardware, and they can’t run more servers than they have hardware. I don’t see a way around that.
It means that any service designed to survive a control plane outage must statically allocate its compute resources and have enough slack that it never relies on auto scaling. True for AWS/GCP/Azure.
That sounds oddly similar to owning hardware.
AWS has never had this type of outage in 20 years. Yet Azure constantly had them.
This is a total failure of engineering and has nothing to do with capacity. Azure is a joke of a cloud.
Which is my point.
The same fault on Azure would be a global (all-regions) fault.
I wanted to try out the most cheapest option out of frugality & that was actually limited (but kudos to them that they mentioned that these servers have limits) so no worries I went and picked the 5.99 euro instead of the 3.99 euro option instead.
They also have limits option itself as a settings iirc and it shows you all the limits that are imposed in a transparent manner and my account's young so I can't request for limit increases but after some time, one definitely can.
Essentially I love this idea because essentially Cloud is just someone's else's hardware and there is no infinitium. But I feel as if it can come pretty close with hetzner (and I have heard some great things about OVH and have a good personal experience with netcup vps but netcup's payments were really PITA to setup]
In Azure, for example, it's possible to use Entra as your Active Directory, along with the fine grained RBAC built in to the platform. On a host that just gives you VPS/DS, you have to run your own AD (and secondary backups). Likewise with things like webservers (IIS) and SQL Server, which both have PaaS offerings with SLAs and all the infra management tasks handled for you in an easily auditable way.
If you just need a few servers at the IaaS level, the big cloud platforms don't look like a great value. But, if you do a SOC2, for example, you're going to have to build all the documentation and observability/controls yourself.
Similar to hetzner, I haven't used OVH but does it also have limits or how do they follow?
Out of pure curiosity, Is there anything aside from the three hyperscaler trifecta which doesn't show limits too?