zlacker

[parent] [thread] 10 comments
1. jorvi+(OP)[view] [source] 2026-02-04 22:36:02
Current LLMs often produce much, much worse results than manually searching.

If you need to search the internet on a topic that is full of unknown unknowns for you, they're a pretty decent way to get a lay of the land, but beyond that, off to Kagi (or Google) you go.

Even worse is that the results are inconsistent. I can ask Gemini five times at what temperature I should take a waterfowl out of the oven, and get five different answers, 10°C apart.

You cannot trust answers from an LLM.

replies(4): >>chryso+U2 >>signat+V6 >>r0x1n1+M7 >>12345h+0f
2. chryso+U2[view] [source] 2026-02-04 22:54:00
>>jorvi+(OP)
It obviously takes discipline, but using something like Perplexity as an aggregator typically gets me better results, because I can click through to the sources.

It's not a perfect solution because you need the discipline/intuition to do that, and not blindly trust the summary.

3. signat+V6[view] [source] 2026-02-04 23:16:16
>>jorvi+(OP)
> I can ask Gemini five times at what temperature I should take a waterfowl out of the oven, and get five different answers, 10°C apart.

Are you sure? Both Gemini and ChatGPT gave me consistent answers 3 times in a row, even if the two versions are slightly different.

Their answers are inline with this version:

https://blog.thermoworks.com/duck_roast/

replies(1): >>jorvi+rF
4. r0x1n1+M7[view] [source] 2026-02-04 23:21:25
>>jorvi+(OP)
I created an account just to point out that this is simply not true. I just tried it! The answers were consistent across all 5 samples with both "Fast" mode and Pro (which I think is really important to mention if you're going to post comments like this - I was thinking maybe it would be inconsistent with the Flash model)
replies(1): >>jorvi+zF
5. 12345h+0f[view] [source] 2026-02-05 00:09:04
>>jorvi+(OP)
Did you actually ask the model this question or are you fully strawmanning?
replies(1): >>jorvi+1F
◧◩
6. jorvi+1F[view] [source] [discussion] 2026-02-05 03:43:37
>>12345h+0f
My mother did, for Christmas. It was a goose that ended up being raw in a lot of places.

I then pointed out this same inconsistency to her, and that she shouldn't put stock in what Gemini says. Testing it myself, it would give results between 47c-57c. And sometimes it would just trip out and give the health-approved temperature, which is 74c (!).

Edit: just tested it again and it still happens. But inconsistency isn't a surprise for anyone who actually knows how LLMs work.

replies(1): >>ziml77+RI
◧◩
7. jorvi+rF[view] [source] [discussion] 2026-02-05 03:48:40
>>signat+V6
What do you mean, "are you sure"? I literally saw and see it happen in front of my eyes. Just now tested it with slight variations of "ideal temperature waterfowl cooking", "best temperature waterfowl roasting", etc. and all these questions yield different answers, with temperatures ranging from 47c-57c (ignoring the 74c food safety ones).

That's my entire point. Even adding an "is" or "the" can get you way different advice. No human would give you different info when you ask "what's the waterfowl's best cooking temperature" vs "what is waterfowl's best roasting temperature".

replies(1): >>cruffl+FJ
◧◩
8. jorvi+zF[view] [source] [discussion] 2026-02-05 03:49:25
>>r0x1n1+M7
Unfortunately, despite your account creation it remains true that this happened. Just tested it again and got different answers.
◧◩◪
9. ziml77+RI[view] [source] [discussion] 2026-02-05 04:24:41
>>jorvi+1F
> But inconsistency isn't a surprise for anyone who actually knows how LLMs work

Exactly. These people saying they've gotten good results for the same question aren't countering your argument. All they're doing is proving that sometimes it can output good results. But a tool that's randomly right or wrong is not a very useful one. You can't trust any of its output unless you can validate it. And for a lot of the questions people ask of it, if you have to validate it, there was no reason to use the LLM in the first place.

◧◩◪
10. cruffl+FJ[view] [source] [discussion] 2026-02-05 04:31:23
>>jorvi+rF
Did you point that out to one of them… like “hey bro, I’ve asked y’all this question in multiple threads and get wildly different answers. Why?”

And the answer is probably because there is no such thing as an ideal temperature for waterfowl because the answer is “it depends” and you didn’t give it enough context to better answer your question.

Context is everything. Give it poor prompts, you’ll get poor answers. LLMs are no different than programming a computer or anything else in this domain.

And learning how to give good context is a skill. One we all need to learn.

replies(1): >>jhhh+9N
◧◩◪◨
11. jhhh+9N[view] [source] [discussion] 2026-02-05 05:14:15
>>cruffl+FJ
If I made a new, not-AI tool called 'correct answer provider' which provided definitive, incorrect answers to things you'd call it bad software. But because it is AI we're going to blame the user for not second guessing the answers or holding it wrong ie. bad prompting.
[go to top]