>>tosh+(OP)
One thing that most people don't realize is that (full parameter)finetuned models are costly unless you run it in batched mode. Which means unless the request rate is very high and consistent, it is better to use prompts with GPT-3.5. e.g. batch of 1, mistral is more expensive than GPT-4[1].
>>Comput+oD
It does. LLMs are most efficient when running large batches, so the gpu cost is super high if you’re underutilizing it. It will cost more than a cloud provider like open ai who has the volume to keep their GPUs saturated