During an incident, the incident lead should be able to say to your team's on call: "can you roll back? If so, roll back" and the oncall engineer should know if it's okay. By default it should be if you're writing code mindfully.
Certain well-understood migrations are the only cases where roll back might not be acceptable.
Always keep your services in "roll back able", "graceful fail", "fail open" state.
This requires tremendous engineering consciousness across the entire org. Every team must be a diligent custodian of this. And even then, it will sometimes break down.
Never make code changes you can't roll back from without reason and without informing the team. Service calls, data write formats, etc.
I've been in the line of billion dollar transaction value services for most of my career. And unfortunately I've been in billion dollar outages.
I won't say never, but a situation where the right answer to avoid a rollback (that it sounds like was technically fine to do, just undesirable from a security/business perspective) is a parallel deployment through a radioactive, global blast radius, near instantaneous deployment system that is under intense scrutiny after another recent outage should be about as probable as a bowl of petunias in orbit
It is absolutely the wrong approach to "fail open" when you can't run security-critical operations.
With small deployments it usually isn't too difficult to re-deploy a previous commit. But once you get big enough you've got enough developers that half a dozen PRs will have been merged since the start of the incident and now. How viable is it to stop the world, undo everything, and start from scratch any time a deployment causes the tiniest issues?
Realistically the best you're going to get is merging a revert of the problematic changeset - but with the intervening merges that's still going to bring the system in a novel state. You're rolling forwards, not backwards.
The short answer is "yes" due to the way the configuration management works. Other infrastructure changes or service upgrades might get undone, but it's possible. Or otherwise revert the commit that introduced the package bump with the new code and force that to rollout everywhere rather than waiting for progressive rollout.
There shouldn't be much chance of bringing the system to a novel state because configuration management will largely put things into the correct state. (Where that doesn't work is if CM previously created files, it won't delete them unless explicitly told to do so.)
This can be architected in such a way that if one rules engine crashes, other systems are not impacted and other rules, cached rules, heuristics, global policies, etc. continue to function and provide shielding.
You can't ask for Cloudflare to turn on a dime and implement this in this manner. Their infra is probably very sensibly architected by great engineers. But there are always holes, especially when moving fast, migrating systems, etc. And there's probably room for more resiliency.
But who knows what issues might reverting other team's stuff bring?