zlacker

[parent] [thread] 3 comments
1. ronyfa+(OP)[view] [source] 2023-09-12 20:14:06
It still does a much better job at translation than llama 2 70b even, at 6.7b params
replies(1): >>two_in+z4
2. two_in+z4[view] [source] 2023-09-12 20:32:07
>>ronyfa+(OP)
If it's MOE that may explain why it's faster and better...
replies(1): >>yumraj+Ae
◧◩
3. yumraj+Ae[view] [source] [discussion] 2023-09-12 21:10:16
>>two_in+z4
MOE?
replies(1): >>sartha+yg
◧◩◪
4. sartha+yg[view] [source] [discussion] 2023-09-12 21:17:40
>>yumraj+Ae
Mixture of Experts Model - https://en.wikipedia.org/wiki/Mixture_of_experts
[go to top]