zlacker

This should be the top comment. Cherry-picking is hurting this industry.

I bet they kept training on coding tasks, made everything worse on the way, and tried to hide it under the rug because of the sunk costs.

replies(2): >>luckyd+x2 >>cma+iL

>>arnaud+(OP)
Or because they realized that coding is what most of those LLMs are used for anyways?

replies(1): >>arnaud+I4

>>luckyd+x2
They should have shown the benchmarks. Or market it as a coding model, like Qwen & Mistral.

replies(1): >>jjani+25

>>arnaud+I4
That's clearly not a PR angle they could possibly take when it's replacing the overall SotA model. This is a business decision, potentially inference cost related.

replies(1): >>arnaud+kg

>>jjani+25
From a business pov it's a great move, for the customers it's evil to hide evidence that your product became worse.

>>arnaud+(OP)
They likely knew continued training on code would have some amount of catastrophic forgetting on other stuff. They didn't throw away the old weights so probably not sunk cost fallacy going on, but since it is relatively new and they found out X% of API token spend was on coding agents (where X is huge), compared to what token spend distribution looked like on prior Geminis that couldn't code well, they probably didn't want the complexity and worse batching of having another model for it if the impacts weren't too large and decided they didn't weight coding enough initially and it is worth the tradeoffs.