zlacker

Yes, openAI is dumping the market with chat-gpt 3.5. Vulture capital behaviour at its finest, and I'm sure government regulations will definitely catch on to this in 20 or 30 years...

It's cheaper than the ELECTRICITY cost of running a llama-70 on your own M1.Max (very energy efficient chip) assuming free hardware.

I guess they are also getting a pretty good cache hit rate - there are only so many questions people ask at scale. But still, it's dumping.

replies(4): >>PUSH_A+L3 >>read_i+T4 >>sacred+n6 >>haxton+a7

>>mrybcz+(OP)
You think they are caching? Even though one of the parameters is temperature? Can of worms, and should be reflected in the pricing if true, don't get me started if they are charging per token for cached responses.

I just don't see it.

replies(1): >>why_on+Vw

>>mrybcz+(OP)
turbo is likely nowhere near 70b.

>>mrybcz+(OP)
Based on my research, GPT-3.5 is likely significantly smaller than 70B parameters, so it would make sense that it's cheaper to run. My guess is that OpenAI significantly overtrained GPT-3.5 to get as small a model as possible to optimize for inference. Also, Nvidia chips are way more efficient at inference than M1 Max. OpenAI also has the advantage of batching API calls which leads to better hardware utilization. I don't have definitive proof that they're not dumping, but economies of scale and optimization seem like better explanations to me.

replies(2): >>hutzli+E8 >>csjh+jn

>>mrybcz+(OP)
gpt3.5 turbo is (mostly likely) Curie which is (most likely) 6.7b params. So, yeah, makes perfect sense that it can't compete with a 70b model on cost.

replies(5): >>ronyfa+79 >>csjh+bn >>why_on+Bw >>jiggaw+3F >>JackRu+gi5

>>sacred+n6
I also do not have proof of anything here, but can't it be both?

They have lots of money now and the market lead. They want to keep the lead and some extra electricity and hardware costs are surely worth it for them, if it keeps the competition from getting traction.

>>haxton+a7
It still does a much better job at translation than llama 2 70b even, at 6.7b params

replies(1): >>two_in+Gd

>>ronyfa+79
If it's MOE that may explain why it's faster and better...

replies(1): >>yumraj+Hn

>>haxton+a7
Is there a source on that? I've never seen anyone think it's below even 70B

>>sacred+n6
What makes you think 3.5 is significantly smaller than 70B?

>>two_in+Gd
MOE?

replies(1): >>sartha+Fp

>>yumraj+Hn
Mixture of Experts Model - https://en.wikipedia.org/wiki/Mixture_of_experts

>>haxton+a7
gpt3.5 turbo is a new model, not Curie. As others have stated, it probably uses Mixture of Experts which lowers inference cost.

>>PUSH_A+L3
You can keep around the KV cache from previous generations which lowers the cost of prompts significantly.

>>haxton+a7
I thought it was fairly well established that GPT 3.5 has something like 130B parameters and that GPT 4 is on the order of 600-1,000

replies(1): >>avion2+uP1

>>jiggaw+3F
I remember:

- gpt-3.5 175b params

- gpt-4 1800b params

>>haxton+a7
These sites say 154B:

https://www.ankursnewsletter.com/p/gpt-4-gpt-3-and-gpt-35-tu...

https://blog.wordbot.io/ai-artificial-intelligence/gpt-3-5-t...