zlacker

[return to "Gemini 2.5 Pro Preview"]
1. segpha+J4[view] [source] 2025-05-06 15:34:48
>>meetpa+(OP)
My frustration with using these models for programming in the past has largely been around their tendency to hallucinate APIs that simply don't exist. The Gemini 2.5 models, both pro and flash, seem significantly less susceptible to this than any other model I've tried.

There are still significant limitations, no amount of prompting will get current models to approach abstraction and architecture the way a person does. But I'm finding that these Gemini models are finally able to replace searches and stackoverflow for a lot of my day-to-day programming.

◧◩
2. Jordan+et[view] [source] 2025-05-06 17:52:40
>>segpha+J4
I recently needed to recommend some IAM permissions for an assistant on a hobby project; not complete access but just enough to do what was required. Was rusty with the console and didn't have direct access to it at the time, but figured it was a solid use case for LLMs since AWS is so ubiquitous and well-documented. I actually queried 4o, 3.7 Sonnet, and Gemini 2.5 for recommendations, stripped the list of duplicates, then passed the result to Gemini to vet and format as JSON. The result was perfectly formatted... and still contained a bunch of non-existent permissions. My first time being burned by a hallucination IRL, but just goes to show that even the latest models working in concert on a very well-defined problem space can screw up.
◧◩◪
3. floydn+8b2[view] [source] 2025-05-07 11:38:00
>>Jordan+et
by asking three different models and then keeping everything single unique thing they gave you, i believe you actually maximized your chances of running into hallucinations.

instead of ignoring the duplicates, when i query different models, i use the duplicates as a signal that something might be more accurate. i wonder what your results might have looked like if you only kept the duplicated permissions and went from there.

[go to top]