zlacker

[return to "Gemini 3 Pro: the frontier of vision AI"]
1. Workac+cU[view] [source] 2025-12-05 20:26:05
>>xnx+(OP)
Well

It is the first model to get partial-credit on an LLM image test I have. Which is counting the legs of a dog. Specifically, a dog with 5 legs. This is a wild test, because LLMs get really pushy and insistent that the dog only has 4 legs.

In fact GPT5 wrote an edge detection script to see where "golden dog feet" met "bright green grass" to prove to me that there were only 4 legs. The script found 5, and GPT-5 then said it was a bug, and adjusted the script sensitivity so it only located 4, lol.

Anyway, Gemini 3, while still being unable to count the legs first try, did identify "male anatomy" (it's own words) also visible in the picture. The 5th leg was approximately where you could expect a well endowed dog to have a "5th leg".

That aside though, I still wouldn't call it particularly impressive.

As a note, Meta's image slicer correctly highlighted all 5 legs without a hitch. Maybe not quite a transformer, but interesting that it could properly interpret "dog leg" and ID them. Also the dog with many legs (I have a few of them) all had there extra legs added by nano-banana.

◧◩
2. Rover2+x01[view] [source] 2025-12-05 20:56:14
>>Workac+cU
I just tried to get Gemini to produce an image of a dog with 5 legs to test this out, and it really struggled with that. It either made a normal dog, or turned the tail into a weird appendage.

Then I asked both Gemini and Grok to count the legs, both kept saying 4.

Gemini just refused to consider it was actually wrong.

Grok seemed to have an existential crisis when I told it it was wrong, becoming convinced that I had given it an elaborate riddle. After thinking for an additional 2.5 minutes, it concluded: "Oh, I see now—upon closer inspection, this is that famous optical illusion photo of a "headless" dog. It's actually a three-legged dog (due to an amputation), with its head turned all the way back to lick its side, which creates the bizarre perspective making it look decapitated at first glance. So, you're right; the dog has 3 legs."

You're right, this is a good test. Right when I'm starting to feel LLMs are intelligent.

◧◩◪
3. qnleig+Xr1[view] [source] 2025-12-05 23:40:30
>>Rover2+x01
It's not obvious to me whether we should count these errors as failures of intelligence or failures of perception. There's at least a loose analogy to optical illusion, which can fool humans quite consistently. Now you might say that a human can usually figure out what's going on and correctly identify the illusion, but we have the luxury of moving our eyes around the image and taking it in over time, while the model's perception is limited to a fixed set of unchanging tokens. Maybe this is relevant.

(Note I'm not saying that you can't find examples of failures of intelligence. I'm just questioning whether this specific test is an example of one).

◧◩◪◨
4. cyanma+st1[view] [source] 2025-12-05 23:52:52
>>qnleig+Xr1
I am having trouble understanding the distinction you’re trying to make here. The computer has the same pixel information that humans do and can spend its time analyzing it in any way it wants. My four-year-old can count the legs of the dog (and then say “that’s silly!”), whereas LLMs have an existential crisis because five-legged-dogs aren’t sufficiently represented in the training data. I guess you can call that perception if you want, but I’m comfortable saying that my kid is smarter than LLMs when it comes to this specific exercise.
◧◩◪◨⬒
5. qnleig+Dm2[view] [source] 2025-12-06 11:11:13
>>cyanma+st1
LLMs can count other objects, so it's not like they're too dumb to count. So a possible model for what's going on is that the circuitry responsible for low-level image recognition has priors baked in that cause it to report unreliable information to parts that are responding for higher-order reason.

So back to the analogy, it could be as if the LLMs experience the equivalent of a very intense optical illusion in these cases, and then completely fall apart trying to make sense of it.

[go to top]