zlacker

[return to "Gemini 2.5 Pro Preview"]
1. mohsen+Z9[view] [source] 2025-05-06 16:00:48
>>meetpa+(OP)
I use Gemini for almost everything. But their model card[1] only compares to o3-mini! In known benchmarks o3 is still ahead:

        +------------------------------+---------+--------------+
        |         Benchmark            |   o3    | Gemini 2.5   |
        |                              |         |    Pro       |
        +------------------------------+---------+--------------+
        | ARC-AGI (High Compute)       |  87.5%  |     —        |
        | GPQA Diamond (Science)       |  87.7%  |   84.0%      |
        | AIME 2024 (Math)             |  96.7%  |   92.0%      |
        | SWE-bench Verified (Coding)  |  71.7%  |   63.8%      |
        | Codeforces Elo Rating        |  2727   |     —        |
        | MMMU (Visual Reasoning)      |  82.9%  |   81.7%      |
        | MathVista (Visual Math)      |  86.8%  |     —        |
        | Humanity’s Last Exam         |  26.6%  |   18.8%      |
        +------------------------------+---------+--------------+
[1] https://storage.googleapis.com/model-cards/documents/gemini-...
◧◩
2. jsnell+Qs[view] [source] 2025-05-06 17:49:34
>>mohsen+Z9
The text in the model card says the results are from March (including the Gemini 2.5 Pro results), and o3 wasn't released yet.

Is this maybe not the updated card, even though the blog post claims there is one? Sure, the timestamp is in late April, but I seem to remember that the first model card for 2.5 Pro was only released in the last couple of weeks.

[go to top]