zlacker

Every frontier model now is a distill of a larger unpublished model. This could be a slightly smaller distill, with potentially the extra tuning you're mentioning.

replies(2): >>cubefo+b5 >>tangju+Tg

>>jjani+(OP)
That's an unsubstantiated claim. I doubt this is true, since people are disproportionately more willing to pay for the best of the best, rather than for something worse.

replies(1): >>vessen+ot2

>>jjani+(OP)
Any info on this?

>>cubefo+b5
“Every” is unsubstantiated but probably accurate. Meta has published theirs (behemoth) and it’s clear this is largely how frontier models are being used and trained right now: too slow and expensive for daily driving inference, distillable at various levels for different tradeoffs.

replies(1): >>cubefo+tK4

>>vessen+ot2
DeepSeek-V3 is not a distilled model, which already disproves the "every" claim. And if you happen to have a model which is better than any other available model, it makes no sense to not use it just because it is allegedly "too slow and expensive". Inference speed is highly unimportant compared to absolute model performance. If inference speed was so important, everyone would use small models. But most people use huge models, the best of the best, like GPT-4o, o3, Claude Sonnet 3.7, Gemini 2.5 Pro. People don't prefer Gemini 2.5 Flash to Gemini 2.5 Pro. And people don't pay for ChatGPT Plus to get more access to faster models, they pay to get access to better, slower models. People want quality from their LLM, not quantity.