It is the first model to get partial-credit on an LLM image test I have. Which is counting the legs of a dog. Specifically, a dog with 5 legs. This is a wild test, because LLMs get really pushy and insistent that the dog only has 4 legs.
In fact GPT5 wrote an edge detection script to see where "golden dog feet" met "bright green grass" to prove to me that there were only 4 legs. The script found 5, and GPT-5 then said it was a bug, and adjusted the script sensitivity so it only located 4, lol.
Anyway, Gemini 3, while still being unable to count the legs first try, did identify "male anatomy" (it's own words) also visible in the picture. The 5th leg was approximately where you could expect a well endowed dog to have a "5th leg".
That aside though, I still wouldn't call it particularly impressive.
As a note, Meta's image slicer correctly highlighted all 5 legs without a hitch. Maybe not quite a transformer, but interesting that it could properly interpret "dog leg" and ID them. Also the dog with many legs (I have a few of them) all had there extra legs added by nano-banana.
Most human beings, if they see a dog that has 5 legs, will quickly think they are hallucinating and the dog really only has 4 legs, unless the fifth leg is really really obvious. It is weird how humans are biased like that:
1. You can look directly at something and not see it because your attention is focused elsewhere (on the expected four legs).
2. Our pre-existing knowledge (dogs have four legs) influences how we interpret visual information from the bottom-up.
3. Our brain actively filters out "unimportant" details that don't align with our expectations or the main "figure" of the dog.
Attention should fix this however, like if you ask the AI to count the number of legs the dog has specifically, it shouldn't go nuts.
A straight up "dumber" computer algorithm that isn't trained extensively on real and realistic image data is going to get this right more often than a transformer that was.
We're all just pattern matching machines and we humans are very good at it.
So much so that we have the sayings - you can't teach an old dog... and a specialist in their field only sees hammer => nails.
Evolution anyone?