zlacker

[return to "Fine-tune your own Llama 2 to replace GPT-3.5/4"]
1. ronyfa+wk[view] [source] 2023-09-12 18:29:55
>>kcorbi+(OP)
For translation jobs, I've experimented with Llama 2 70B (running on Replicate) v/s GPT-3.5;

For about 1000 input tokens (and resulting 1000 output tokens), to my surprise, GPT-3.5 turbo was 100x cheaper than Llama 2.

Llama 7B wasn't up to the task fyi, producing very poor translations.

I believe that OpenAI priced GPT-3.5 aggressively cheap in order to make it a non-brainer to rely on them rather than relying on other vendors (even open source models).

I'm curious to see if others have gotten different results?

◧◩
2. Muffin+vs[view] [source] 2023-09-12 18:50:41
>>ronyfa+wk
I thought Llama was opensource/free and you could run it yourself?
◧◩◪
3. thewat+ZG[view] [source] 2023-09-12 19:38:26
>>Muffin+vs
You (currently) need a GPU to run any of the useful models. I haven't really seen a business use-case that runs it on the user's computer, but given the hardware requirements it wouldn't be very feasible to expect.

So you'll have to figure out how to run/scale the model inference. Cloud GPU instances are generally very expensive, and once you start needing to horizontally scale it'll get messy fast.

At least at the moment it's expensive, especially if it's either very light usage or very intensive usage - you either need just a few seconds of compute occasionally, or lots of compute all the time requiring scaling.

The "lucky" ones in this scenario are small-medium businesses that can use one or a few cards on-site for their traffic. Even then when you take the cost of an A100 + maintaining it, etc. OpenAI's offering still looks attractive.

I know there's a few services that try to provide an api similar to what openai has, and some software to self orchestrate it, I'm curious how those compare...

[go to top]