zlacker

Sounds like they were losing so much money on 2.5-Pro they came up with a forced update that made it cheaper to run. They can't come out with "we've made it worse across the board", nor do they want to be the first to actually raise prices, so instead they made a bit of a distill that's slightly better at coding so they can still spin it positively.

replies(2): >>sauwan+83 >>Workac+e9

>>jjani+(OP)
I'd be surprised if this was a new base model. It sounds like they just did some post-training RL tuning to make this version specifically stronger for coding, at the expense of other priorities.

replies(1): >>jjani+R6

>>sauwan+83
Every frontier model now is a distill of a larger unpublished model. This could be a slightly smaller distill, with potentially the extra tuning you're mentioning.

replies(2): >>cubefo+2c >>tangju+Kn

>>jjani+(OP)
Google doesn't pay the nvidia tax. Their TPUs are designed for Gemini and Gemini designed for their TPUs. Google is no doubt paying far less per token than every other AI house.

>>jjani+R6
That's an unsubstantiated claim. I doubt this is true, since people are disproportionately more willing to pay for the best of the best, rather than for something worse.

replies(1): >>vessen+fA2

>>jjani+R6
Any info on this?

>>cubefo+2c
“Every” is unsubstantiated but probably accurate. Meta has published theirs (behemoth) and it’s clear this is largely how frontier models are being used and trained right now: too slow and expensive for daily driving inference, distillable at various levels for different tradeoffs.

replies(1): >>cubefo+kR4

>>vessen+fA2
DeepSeek-V3 is not a distilled model, which already disproves the "every" claim. And if you happen to have a model which is better than any other available model, it makes no sense to not use it just because it is allegedly "too slow and expensive". Inference speed is highly unimportant compared to absolute model performance. If inference speed was so important, everyone would use small models. But most people use huge models, the best of the best, like GPT-4o, o3, Claude Sonnet 3.7, Gemini 2.5 Pro. People don't prefer Gemini 2.5 Flash to Gemini 2.5 Pro. And people don't pay for ChatGPT Plus to get more access to faster models, they pay to get access to better, slower models. People want quality from their LLM, not quantity.