zlacker

[parent] [thread] 9 comments
1. haxton+(OP)[view] [source] 2023-09-12 20:07:40
gpt3.5 turbo is (mostly likely) Curie which is (most likely) 6.7b params. So, yeah, makes perfect sense that it can't compete with a 70b model on cost.
replies(5): >>ronyfa+X1 >>csjh+1g >>why_on+rp >>jiggaw+Tx >>JackRu+6b5
2. ronyfa+X1[view] [source] 2023-09-12 20:14:06
>>haxton+(OP)
It still does a much better job at translation than llama 2 70b even, at 6.7b params
replies(1): >>two_in+w6
◧◩
3. two_in+w6[view] [source] [discussion] 2023-09-12 20:32:07
>>ronyfa+X1
If it's MOE that may explain why it's faster and better...
replies(1): >>yumraj+xg
4. csjh+1g[view] [source] 2023-09-12 21:08:25
>>haxton+(OP)
Is there a source on that? I've never seen anyone think it's below even 70B
◧◩◪
5. yumraj+xg[view] [source] [discussion] 2023-09-12 21:10:16
>>two_in+w6
MOE?
replies(1): >>sartha+vi
◧◩◪◨
6. sartha+vi[view] [source] [discussion] 2023-09-12 21:17:40
>>yumraj+xg
Mixture of Experts Model - https://en.wikipedia.org/wiki/Mixture_of_experts
7. why_on+rp[view] [source] 2023-09-12 21:48:07
>>haxton+(OP)
gpt3.5 turbo is a new model, not Curie. As others have stated, it probably uses Mixture of Experts which lowers inference cost.
8. jiggaw+Tx[view] [source] 2023-09-12 22:32:33
>>haxton+(OP)
I thought it was fairly well established that GPT 3.5 has something like 130B parameters and that GPT 4 is on the order of 600-1,000
replies(1): >>avion2+kI1
◧◩
9. avion2+kI1[view] [source] [discussion] 2023-09-13 08:47:05
>>jiggaw+Tx
I remember:

- gpt-3.5 175b params

- gpt-4 1800b params

10. JackRu+6b5[view] [source] 2023-09-14 10:52:20
>>haxton+(OP)
These sites say 154B:

https://www.ankursnewsletter.com/p/gpt-4-gpt-3-and-gpt-35-tu...

https://blog.wordbot.io/ai-artificial-intelligence/gpt-3-5-t...

[go to top]