This is the biggest problem we're having swapping LLMs. While Langchain allows easy swap, and while we dont care as much about quality during integration testing, etc...the bigger problem is following directions. OpenAI does well at outputting a JSON if I ask for one. Unfortunately now our software has come to expect JSON output in such cases. Swap it to, say, llama2 and you dont get JSON even if asking for one. This makes swapping not just a quality decision but an integration challenge.
The primary issue I’ve run into is exhausting the context window much sooner than I’d like. Fine-tuning tends to mostly fix this issue though.