zlacker

One thing that most people don't realize is that (full parameter)finetuned models are costly unless you run it in batched mode. Which means unless the request rate is very high and consistent, it is better to use prompts with GPT-3.5. e.g. batch of 1, mistral is more expensive than GPT-4[1].

[1]: https://docs.mystic.ai/docs/mistral-ai-7b-vllm-fast-inferenc...

replies(2): >>Comput+S1 >>MacsHe+zN

>>YetAno+(OP)
But this doesn’t apply to self-hosted, o?

replies(1): >>lyjack+Dg

>>Comput+S1
It does. LLMs are most efficient when running large batches, so the gpu cost is super high if you’re underutilizing it. It will cost more than a cloud provider like open ai who has the volume to keep their GPUs saturated

replies(1): >>jay-ba+bv

>>lyjack+Dg
Yup. It’s also important to mention that OpenAI enjoys the luxury of having large clusters of H100s (the last time I checked).

>>YetAno+(OP)
I cloud host Mistral 7B for 20x cheaper than GPT-4-Turbo.

And Mistral 7B API is $0.00/1M tokens, i.e. free : https://openrouter.ai/models/mistralai/mistral-7b-instruct