zlacker

[parent] [thread] 3 comments

It still does a much better job at translation than llama 2 70b even, at 6.7b params

replies(1): >>two_in+z4

>>ronyfa+(OP)
If it's MOE that may explain why it's faster and better...

replies(1): >>yumraj+Ae

>>two_in+z4
MOE?

replies(1): >>sartha+yg

>>yumraj+Ae
Mixture of Experts Model - https://en.wikipedia.org/wiki/Mixture_of_experts