zlacker

[return to "Gemini 2.5 Pro Preview"]
1. andy12+F8[view] [source] 2025-05-06 15:55:14
>>meetpa+(OP)
Interestingly, when compering benchmarks of Experimental 03-25 [1] and Experimental 05-06 [2] it seems the new version scores slightly lower in everything except on LiveCodeBench.

[1] https://storage.googleapis.com/model-cards/documents/gemini-... [2] https://deepmind.google/technologies/gemini/

◧◩
2. arnaud+ua[view] [source] 2025-05-06 16:03:14
>>andy12+F8
This should be the top comment. Cherry-picking is hurting this industry.

I bet they kept training on coding tasks, made everything worse on the way, and tried to hide it under the rug because of the sunk costs.

[go to top]