I have a hierarchy of templates, where I can automatically swap out parts of the prompt based on which LLM I am using. And also have a set of benchmarking tests to compare relative performance. I treat LLMs like a commodity and keep switching between them to compare performance.