zlacker

[return to "Gemini 2.5 Pro Preview"]
1. segpha+J4[view] [source] 2025-05-06 15:34:48
>>meetpa+(OP)
My frustration with using these models for programming in the past has largely been around their tendency to hallucinate APIs that simply don't exist. The Gemini 2.5 models, both pro and flash, seem significantly less susceptible to this than any other model I've tried.

There are still significant limitations, no amount of prompting will get current models to approach abstraction and architecture the way a person does. But I'm finding that these Gemini models are finally able to replace searches and stackoverflow for a lot of my day-to-day programming.

◧◩
2. jug+xq[view] [source] 2025-05-06 17:37:16
>>segpha+J4
I’ve seen benchs on hallucinations and OpenAI has typically performed worse than Google and Anthropic models. Sometimes significantly so. But it doesn’t seem like they have cared much. I’ve suspected that LLM performance is correlated to risking hallucinations? That is, if they’re bolder, this can be beneficial? Which helps in other performance benchmarks. But of course at the risk of hallucinating more…
◧◩◪
3. mounta+hs[view] [source] 2025-05-06 17:46:17
>>jug+xq
The hallucinations are a result of RLVR. We reward the model for an answer and then force it to reason about how to get there when the base model may not have that information.
◧◩◪◨
4. mdp202+pL[view] [source] 2025-05-06 19:52:33
>>mounta+hs
> The hallucinations are a result of RLVR

Well let us reward them for producing output that is consistent with database accessed selected documentation then, and massacre them for output they cannot justify - like we do with humans.

[go to top]