zlacker

[return to "Gemini 2.5 Pro Preview"]
1. andy12+F8[view] [source] 2025-05-06 15:55:14
>>meetpa+(OP)
Interestingly, when compering benchmarks of Experimental 03-25 [1] and Experimental 05-06 [2] it seems the new version scores slightly lower in everything except on LiveCodeBench.

[1] https://storage.googleapis.com/model-cards/documents/gemini-... [2] https://deepmind.google/technologies/gemini/

◧◩
2. arnaud+ua[view] [source] 2025-05-06 16:03:14
>>andy12+F8
This should be the top comment. Cherry-picking is hurting this industry.

I bet they kept training on coding tasks, made everything worse on the way, and tried to hide it under the rug because of the sunk costs.

◧◩◪
3. luckyd+1d[view] [source] 2025-05-06 16:16:44
>>arnaud+ua
Or because they realized that coding is what most of those LLMs are used for anyways?
◧◩◪◨
4. arnaud+cf[view] [source] 2025-05-06 16:29:32
>>luckyd+1d
They should have shown the benchmarks. Or market it as a coding model, like Qwen & Mistral.
◧◩◪◨⬒
5. jjani+wf[view] [source] 2025-05-06 16:32:14
>>arnaud+cf
That's clearly not a PR angle they could possibly take when it's replacing the overall SotA model. This is a business decision, potentially inference cost related.
[go to top]