For about 1000 input tokens (and resulting 1000 output tokens), to my surprise, GPT-3.5 turbo was 100x cheaper than Llama 2.
Llama 7B wasn't up to the task fyi, producing very poor translations.
I believe that OpenAI priced GPT-3.5 aggressively cheap in order to make it a non-brainer to rely on them rather than relying on other vendors (even open source models).
I'm curious to see if others have gotten different results?
From what I've read and personally experimented with, none of the Llama 2 models are well-suited to translation in particular (they were mainly trained on English data). Still, there are a number of tasks that they're really good at if fine-tuned correctly, such as classification and data extraction.
> I believe that OpenAI priced GPT-3.5 aggressively cheap in order to make it a non-brainer to rely on them rather than relying on other vendors (even open source models).
I think you're definitely right about that, and in most cases just using GPT 3.5 for one-off tasks makes the most sense. I think when you get into production workflows that scale, that's when using a small fine-tuned models starts making more sense. You can drop the system prompt and get data in the format you'd expect it in, and train on GPT-4's output to sometimes get better accuracy than 3.5 would give you right off the bat. And keep in mind, while you can do the same thing with a fine-tuned 3.5 model, it's going to cost 8x the base 3.5 price per token.