Gemini 2.5 Pro Preview

>>meetpa+(OP)
Interestingly, when compering benchmarks of Experimental 03-25 [1] and Experimental 05-06 [2] it seems the new version scores slightly lower in everything except on LiveCodeBench.

[1] https://storage.googleapis.com/model-cards/documents/gemini-... [2] https://deepmind.google/technologies/gemini/

>>andy12+F8
This should be the top comment. Cherry-picking is hurting this industry.

I bet they kept training on coding tasks, made everything worse on the way, and tried to hide it under the rug because of the sunk costs.

>>arnaud+ua
Or because they realized that coding is what most of those LLMs are used for anyways?

>>luckyd+1d
They should have shown the benchmarks. Or market it as a coding model, like Qwen & Mistral.

>>arnaud+cf
That's clearly not a PR angle they could possibly take when it's replacing the overall SotA model. This is a business decision, potentially inference cost related.

zlacker