zlacker

[return to "Gemini 3 Pro: the frontier of vision AI"]
1. Workac+cU[view] [source] 2025-12-05 20:26:05
>>xnx+(OP)
Well

It is the first model to get partial-credit on an LLM image test I have. Which is counting the legs of a dog. Specifically, a dog with 5 legs. This is a wild test, because LLMs get really pushy and insistent that the dog only has 4 legs.

In fact GPT5 wrote an edge detection script to see where "golden dog feet" met "bright green grass" to prove to me that there were only 4 legs. The script found 5, and GPT-5 then said it was a bug, and adjusted the script sensitivity so it only located 4, lol.

Anyway, Gemini 3, while still being unable to count the legs first try, did identify "male anatomy" (it's own words) also visible in the picture. The 5th leg was approximately where you could expect a well endowed dog to have a "5th leg".

That aside though, I still wouldn't call it particularly impressive.

As a note, Meta's image slicer correctly highlighted all 5 legs without a hitch. Maybe not quite a transformer, but interesting that it could properly interpret "dog leg" and ID them. Also the dog with many legs (I have a few of them) all had there extra legs added by nano-banana.

◧◩
2. daniel+dY[view] [source] 2025-12-05 20:45:15
>>Workac+cU
I don’t know much about AI, but I have this image test that everything has failed at. You basically just present an image of a maze and ask the LLM to draw a line through the most optimal path.

Here’s how Nano Banana fared: https://x.com/danielvaughn/status/1971640520176029704?s=46

◧◩◪
3. kridsd+Y31[view] [source] 2025-12-05 21:13:58
>>daniel+dY
I have also tried the maze from a photo test a few times and never seen a one-shot success. But yesterday I was determined to succeed so I allowed Gemini 3 to write a python gui app that takes in photos of physical mazes (I have a bunch of 3d printed ones) and find the path. This does work.

Gemini 3 then one-shot ported the whole thing (which uses CV py libraries) to a single page html+js version which works just as well.

I gave that to Claude to assess and assign a FAANG hiring level to, and it was amazed and said Gemini 3 codes like an L6.

Since I work for Google and used my phone in the office to do this, I think I can't share the source or file.

[go to top]