zlacker

[return to "Gemini 2.5 Pro Preview"]
1. mohsen+Z9[view] [source] 2025-05-06 16:00:48
>>meetpa+(OP)
I use Gemini for almost everything. But their model card[1] only compares to o3-mini! In known benchmarks o3 is still ahead:

        +------------------------------+---------+--------------+
        |         Benchmark            |   o3    | Gemini 2.5   |
        |                              |         |    Pro       |
        +------------------------------+---------+--------------+
        | ARC-AGI (High Compute)       |  87.5%  |     —        |
        | GPQA Diamond (Science)       |  87.7%  |   84.0%      |
        | AIME 2024 (Math)             |  96.7%  |   92.0%      |
        | SWE-bench Verified (Coding)  |  71.7%  |   63.8%      |
        | Codeforces Elo Rating        |  2727   |     —        |
        | MMMU (Visual Reasoning)      |  82.9%  |   81.7%      |
        | MathVista (Visual Math)      |  86.8%  |     —        |
        | Humanity’s Last Exam         |  26.6%  |   18.8%      |
        +------------------------------+---------+--------------+
[1] https://storage.googleapis.com/model-cards/documents/gemini-...
◧◩
2. cbg0+nF[view] [source] 2025-05-06 19:08:13
>>mohsen+Z9
o3 is $40/M output tokens and 2.5 Pro is $10-15/M output tokens so o3 being slightly ahead is not really worth 4 times more than gemini.
[go to top]