zlacker

[return to "Fine-tune your own Llama 2 to replace GPT-3.5/4"]
1. ronyfa+wk[view] [source] 2023-09-12 18:29:55
>>kcorbi+(OP)
For translation jobs, I've experimented with Llama 2 70B (running on Replicate) v/s GPT-3.5;

For about 1000 input tokens (and resulting 1000 output tokens), to my surprise, GPT-3.5 turbo was 100x cheaper than Llama 2.

Llama 7B wasn't up to the task fyi, producing very poor translations.

I believe that OpenAI priced GPT-3.5 aggressively cheap in order to make it a non-brainer to rely on them rather than relying on other vendors (even open source models).

I'm curious to see if others have gotten different results?

◧◩
2. mrybcz+hJ[view] [source] 2023-09-12 19:45:57
>>ronyfa+wk
Yes, openAI is dumping the market with chat-gpt 3.5. Vulture capital behaviour at its finest, and I'm sure government regulations will definitely catch on to this in 20 or 30 years...

It's cheaper than the ELECTRICITY cost of running a llama-70 on your own M1.Max (very energy efficient chip) assuming free hardware.

I guess they are also getting a pretty good cache hit rate - there are only so many questions people ask at scale. But still, it's dumping.

◧◩◪
3. sacred+EP[view] [source] 2023-09-12 20:05:26
>>mrybcz+hJ
Based on my research, GPT-3.5 is likely significantly smaller than 70B parameters, so it would make sense that it's cheaper to run. My guess is that OpenAI significantly overtrained GPT-3.5 to get as small a model as possible to optimize for inference. Also, Nvidia chips are way more efficient at inference than M1 Max. OpenAI also has the advantage of batching API calls which leads to better hardware utilization. I don't have definitive proof that they're not dumping, but economies of scale and optimization seem like better explanations to me.
[go to top]