zlacker

On the other hand, there is value in being lazy in the right places.

Like, in our first configuration management, we were strict and anal about testing everything. Add a new property for a config file? Needs a unit test for the default value, needs a unit test for overriding the value. This resulted in so many dang tests, so it was great, wasn't it? Eh. You kinda change one default value and 32 tests across multiple cookbooks fail and it's a big hassle. So you stop changing and improving things.

In our new configuration management, we're much more coarse about testing. Setup a consul server, setup a patroni cluster, see if the patroni cluster elects a leader and ships archives to pgbackrest. If it does, the important 90% of the system is probably setup right. Maybe check that a few important metrics are shipped via telegraf and filebeat looks at the right logs, too, that'll be 95% - 99%. The other 90% going wrong after that won't be caught in tests anyway because they tend to be based on business needs in prod.

Once that works, why bother testing that ansible can write a variable into a file if the template is changed? Maybe if there is some computation in there, but more often than not, these computations are hard to setup correctly in a test environment, so eh.

This is similar to how someone gave me shit years ago about not writing tests for the initial configuration loading and dynamic module loading of a custom application server. The thing is, if I break this in a way I didn't catch, it will break QA and every single local test as soon as I push that and many people will yell at me.