zlacker

gpt3.5 turbo is (mostly likely) Curie which is (most likely) 6.7b params. So, yeah, makes perfect sense that it can't compete with a 70b model on cost.

replies(5): >>ronyfa+X1 >>csjh+1g >>why_on+rp >>jiggaw+Tx >>JackRu+6b5

>>haxton+(OP)
It still does a much better job at translation than llama 2 70b even, at 6.7b params

replies(1): >>two_in+w6

>>ronyfa+X1
If it's MOE that may explain why it's faster and better...

replies(1): >>yumraj+xg

>>haxton+(OP)
Is there a source on that? I've never seen anyone think it's below even 70B

>>two_in+w6
MOE?

replies(1): >>sartha+vi

>>yumraj+xg
Mixture of Experts Model - https://en.wikipedia.org/wiki/Mixture_of_experts

>>haxton+(OP)
gpt3.5 turbo is a new model, not Curie. As others have stated, it probably uses Mixture of Experts which lowers inference cost.

>>haxton+(OP)
I thought it was fairly well established that GPT 3.5 has something like 130B parameters and that GPT 4 is on the order of 600-1,000

replies(1): >>avion2+kI1

>>jiggaw+Tx
I remember:

- gpt-3.5 175b params

- gpt-4 1800b params

>>haxton+(OP)
These sites say 154B:

https://www.ankursnewsletter.com/p/gpt-4-gpt-3-and-gpt-35-tu...

https://blog.wordbot.io/ai-artificial-intelligence/gpt-3-5-t...