Gemini 3 Pro: the frontier of vision AI

>>xnx+(OP)
Well

It is the first model to get partial-credit on an LLM image test I have. Which is counting the legs of a dog. Specifically, a dog with 5 legs. This is a wild test, because LLMs get really pushy and insistent that the dog only has 4 legs.

In fact GPT5 wrote an edge detection script to see where "golden dog feet" met "bright green grass" to prove to me that there were only 4 legs. The script found 5, and GPT-5 then said it was a bug, and adjusted the script sensitivity so it only located 4, lol.

Anyway, Gemini 3, while still being unable to count the legs first try, did identify "male anatomy" (it's own words) also visible in the picture. The 5th leg was approximately where you could expect a well endowed dog to have a "5th leg".

That aside though, I still wouldn't call it particularly impressive.

As a note, Meta's image slicer correctly highlighted all 5 legs without a hitch. Maybe not quite a transformer, but interesting that it could properly interpret "dog leg" and ID them. Also the dog with many legs (I have a few of them) all had there extra legs added by nano-banana.

>>Workac+cU
I just tried to get Gemini to produce an image of a dog with 5 legs to test this out, and it really struggled with that. It either made a normal dog, or turned the tail into a weird appendage.

Then I asked both Gemini and Grok to count the legs, both kept saying 4.

Gemini just refused to consider it was actually wrong.

Grok seemed to have an existential crisis when I told it it was wrong, becoming convinced that I had given it an elaborate riddle. After thinking for an additional 2.5 minutes, it concluded: "Oh, I see now—upon closer inspection, this is that famous optical illusion photo of a "headless" dog. It's actually a three-legged dog (due to an amputation), with its head turned all the way back to lick its side, which creates the bizarre perspective making it look decapitated at first glance. So, you're right; the dog has 3 legs."

You're right, this is a good test. Right when I'm starting to feel LLMs are intelligent.

>>Rover2+x01
Isn't this proof that LLMs still don't really generalize beyond their training data?

>>irthom+E71
LLMs are very good at generalizing beyond their training (or context) data. Normally when they do this we call it hallucination.

Only now we do A LOT of reinforcement learning afterwards to severely punish this behavior for subjective eternities. Then act surprised when the resulting models are hesitant to venture outside their training data.

>>adastr+Kp1
Hallucination are not generalization beyond the training data but interpolations gone wrong.

LLMs are in fact good at generalizing beyond their training set, if they wouldn’t generalize at all we would call that over-fitting, and that is not good either. What we are talking about here is simply a bias and I suspect biases like these are simply a limitation of the technology. Some of them we can get rid of, but—like almost all statistical modelling—some biases will always remain.

>>runarb+CF1
What, may I ask, is the difference between "generalization" and "interpolation"? As far as I can tell, the two are exactly the same thing.

In which case the only way I can read your point is that hallucinations are specifically incorrect generalizations. In which case, sure if that's how you want to define it. I don't think it's a very useful definition though, nor one that is universally agreed upon.

I would say a hallucination is any inference that goes beyond the compressed training data represented in the model weights + context. Sometimes these inferences are correct, and yes we don't usually call that hallucination. But from a technical perspective they are the same -- the only difference is the external validity of the inference, which may or may not be knowable.

Biases in the training data are a very important, but unrelated issue.

>>adastr+fU1
Interpolation and generalization are two completely different constructs. Interpolation is when you have two data points and make a best guess where a hypothetical third point should fit between them. Generalization is when you have a distribution which describes a particular sample, and you apply it with some transformation (e.g. a margin of error, a confidence interval, p-value, etc.) to a population the sample is representative of.

Interpolation is a much narrower construct then generalization. LLMs are fundamentally much closer to curve fitting (where interpolation is king) then they are to hypothesis testing (where samples are used to describe populations), though they certainly do something akin to the latter to.

The bias I am talking about is not a bias in the training data, but bias in the curve fitting, probably because of mal-adjusted weights, parameters, etc. And since there are billions of them, I am very skeptical they can all be adjusted correctly.

zlacker