zlacker

[return to "Gemini 3 Pro: the frontier of vision AI"]
1. djoldm+1H[view] [source] 2025-12-05 19:18:33
>>xnx+(OP)
Interesting "ScreenSpot Pro" results:

    72.7% Gemini 3 Pro
    11.4% Gemini 2.5 Pro
    49.9% Claude Opus 4.5
    3.50% GPT-5.1
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

https://arxiv.org/abs/2504.07981

◧◩
2. jasonj+OQ[view] [source] 2025-12-05 20:07:51
>>djoldm+1H
That is... astronomically different. Is GPT-5.1 downscaling and losing critical information or something? How could it be so different?
◧◩◪
3. energy+Eq1[view] [source] 2025-12-05 23:28:22
>>jasonj+OQ
This is my default explanation for visual impairments in LLMs, they're trying to compress the image into about 3000 tokens, you're going to lose a lot in the name of efficiency.
[go to top]