zlacker

Rollback is a reliable strategy when the rollback process is well understood. If a rollback process is not well known and well experienced, then it is a risk in itself.

I'm not sure of the nature of the rollback process in this case, but leaning on ill-founded assumptions is a bad practice. I do agree that a global rollout is a problem.

replies(2): >>newsof+Qm >>progra+jz

>>liampu+(OP)
Rollback carries with it the contextual understanding of complete atomicity; otherwise it's slightly better than a yeet. It's similar to backups that are untested.

replies(1): >>marcos+Vx

>>newsof+Qm
Complete atomicity carries with it the idea that the world is frozen, and any data only needs to change when you allow it to.

That's to say, it's an incredibly good idea when you can physically implement it. It's not something that everybody can do.

replies(1): >>newsof+TF

>>liampu+(OP)
Global rollout of security code on a timeframe of seconds is part of Cloudflare's value proposition.

In this case they got unlucky with an incident before they finished work on planned changes from the last incident.

replies(1): >>flamin+1n2

>>marcos+Vx
No, complete atomicity doesn't require a frozen state, it requires common sense and fail-proof, fool-proof guarantees derived from assurances gained from testing.

There is another name for rolling forward, it's called tripping up.

>>progra+jz
That's entirely incorrect. For starters, they didn't get unlucky. They made a choice to use the same system they knew was sketchy (which they almost certainly knew was sketchy even before 11/18)

And on top of that, Cloudflare's value proposition is "we're smart enough to know that instantaneous global deployments are a bad idea, so trust us to manage services for you so you don't have to rely on in house folks who might not know better"