zlacker

I have been asking if AI without hallucination, coding or not is possible but so far with no real concrete answer.

replies(3): >>mattlo+y >>Foreig+F2 >>pizza+ae

>>ksec+(OP)
It's already much improved on the early days.

But I wonder when we'll be happy? Do we expect colleagues friends and family to be 100% laser-accurate 100% of the time? I'd wager we don't. Should we expect that from an artificial intelligence too?

replies(7): >>cinnta+A1 >>ziml77+S1 >>kweing+62 >>pohuin+g2 >>kortil+e8 >>ksec+bE >>mdp202+3G

>>mattlo+y
It's tool not a human so I don't know if the comparison even makes sense?

>>mattlo+y
Yes we should expect better from an AI that has a knowledge base much larger than any individual and which can very quickly find and consume documentation. I also expect them to not get stuck trying the same thing they've already been told doesn't work, same as I would expect from a person.

>>mattlo+y
I expect my calculator to be 100% accurate 100% of the time. I have slightly more tolerance for other software having defects, but not much more.

replies(7): >>asadot+y9 >>Analem+pe >>pizza+af >>Vvecto+Zl >>gilbet+Bt >>mattlo+Yy >>LordDr+DU

>>mattlo+y
It's a tool, not an intelligence, a tool that costs money on every erroneous token. I expect my computer to be more reliable at remembering things than myself, that's one of the primary use cases even. Especially if using it costs money. Of course errors are possible, but rarely do they happen as frequently in any other program I use.

>>ksec+(OP)
Try dropping the entire api docs in the context. If it’s verbose, i usually pull only a subset of pages.

Usually I’m using a minimum of 200k tokens to start with gemini 2.5.

replies(1): >>nolist+Iq

>>mattlo+y
If colleagues lie with the certainty that LLMs do, they would get fired for incompetence.

replies(3): >>scarab+Tf >>dmd+qm >>Chroma+cp

>>kweing+62
And a $2.99 drugstore slim wallet calculator with solar power gets it right 100% of the time while billion dollar LLMs can still get arithmetic wrong on occasion.

replies(1): >>pb7+hg

>>ksec+(OP)
"if it were a fact, it wouldn't be called intelligence" - donald rumsfeld

>>kweing+62
I don't think that's the relevant comparison though. Do you expect StackOverflow or product documentation to be 100% accurate 100% of the time? I definitely don't.

replies(3): >>ctxc+Jf >>ctxc+Vf >>kweing+vx

>>kweing+62
Are you sure about that? Try these..

- (1e(1e10) + 1) - 1e(1e10)

- sqrt(sqrt(2)) * sqrt(sqrt(2)) * sqrt(sqrt(2)) * sqrt(sqrt(2))

replies(1): >>ctxc+dg

>>Analem+pe
The error introduced by the data is expected and internalized, it's the error of LLMs on _top_ of that that's hard to.

>>kortil+e8
I wish that were true, but I’ve found that certain types of employees do confidently lie as much as llms, especially when answering “do you understand” type questions

replies(1): >>izacus+Fu

>>Analem+pe
Also, documentation and SO are incorrect in a predictable way. We don't expect them to state things in a matter of fact way that just don't exist.

>>pizza+af
Three decades and I haven't had to do anything remotely resembling this on a calculator, much less find the calculator wrong. Same for the majority of general population I assume.

replies(2): >>jjmarr+2k >>tasuki+Uk

>>asadot+y9
My hammer can't do any arithmetic at all, why does anyone even use them?

replies(2): >>namari+ui >>izacus+yu

>>pb7+hg
Does it sometimes instead of driving a nail hit random things in the house?

replies(1): >>hn_go_+kx

>>ctxc+dg
(1/3)*3

>>ctxc+dg
The person you're replying to pointed out that you shouldn't expect a calculator to be 100% accurate 100% of the time. Especially not when faced with adversarial prompts.

>>kweing+62
Try "1/3". The calculator answer is not "100% accurate"

replies(1): >>bb88+Np

>>kortil+e8
Or elected to high office.

>>kortil+e8
Have you worked in an actual workplace. Confidence is king.

replies(1): >>kortil+iP7

>>Vvecto+Zl
I had a casio calculator back in the 1980's that did fractions.

So when I punched in 1/3 it was exactly 1/3.

>>Foreig+F2
That's more than 222 novel pages:

200k tk = 1/3 200k words = 1/300 1/3 200k pages

replies(1): >>Foreig+v61

>>kweing+62
It's your option not to use it. However, this is a competitive environment and so we will see who pulls out ahead, those that use AI as a productivity multiplier versus those that do not. Maybe that multiplier is less than 1, time will tell.

replies(1): >>kweing+Hv

>>pb7+hg
What you're being asked is to stop trying to hammer every single thing that comes into your vicinity. Smashing your computer with a hammer won't create code.

>>scarab+Tf
And we try to PIP and fire those as well, not turn everyone else into them.

>>gilbet+Bt
Agreed. The nice thing is that I am told by HN and Twitter that agentic workflows makes code tasks very easy, so if it turns out that using these tools multiplies productivity, then I can just start using them and it will be easy. Then I am caught up with the early adopters and don't need to worry about being out-competed by them.

>>namari+ui
Yes, like my thumb.

replies(1): >>namari+2T1

>>Analem+pe
I actually agree with this. I use LLMs often, and I don't compare them to a calculator.

Mainly I meant to push back against the reflexive comparison to a friend or family member or colleague. AI is a multi-purpose tool that is used for many different kinds of tasks. Some of these tasks are analogues to human tasks, where we should anticipate human error. Others are not, and yet we often ask an LLM to do them anyway.

>>kweing+62
AIs aren't intended to be used as calculators though?

You could say that when I use my spanner/wrench to tighten a nut it works 100% of the time, but as soon as I try to use a screwdriver it's terrible and full of problems and it can't even reliably so something as trivially easy as tighten a nut, even though a screwdriver works the same way by using torque to tighten a fastener.

Well that's because one tool is designed for one thing, and one is designed for another.

replies(2): >>mdp202+sH >>theweb+9T

>>mattlo+y
I dont expect it to be 100% accurate. Software aren't bug free, human aren't perfect. But may be 99.99%? At least given enough time and resources human could fact check it ourselves. And precisely because we know we are not perfect, in accounting and court cases we have due diligence.

And it is also not just about the %. It is also about the type of error. Will we reach a point we change our perception and say these are expected non-human error?

Or could we have a specific LLM that only checks for these types of error?

>>mattlo+y
Yes we want people "in the game" to be of sound mind. (The matter there is not about being accurate, but of being trustworthy - substance, not appearance.)

And tools in the game, even more so (there's no excuse for the engineered).

>>mattlo+Yy
> AIs are

"AI"s are designed to be reliable; "AGI"s are designed to be intelligent; "LLM"s seem to be designed to make some qualities emerge.

> one tool is designed for one thing, and one is designed for another

The design of LLMs seems to be "let us see where the promise leads us". That is not really "design", i.e. "from need to solution".

>>mattlo+Yy
> AIs aren't intended to be used as calculators though?

Then why are we using them to write code, which should produce reliable outputs for a given input...much like a calculator.

Obviously we want the code to produce correct results for whatever input we give, and as it stands now, I can't trust LLM output without reviewing first. Still a helpful tool, but ultimately my desire would be to have them be as accurate as a calculator so they can be trusted enough to not need the review step.

Using an LLM and being OK with untrustworthy results, it'd be like clicking the terminal icon on my dock and sometimes it opens terminal, sometimes it might open a browser, or just silently fail because there's no reproducible output for any given input to an LLM. To me that's a problem, output should be reproducible, especially if it's writing code.

replies(2): >>kupopu+052 >>mattlo+MA2

>>kweing+62
A calculator isn't software, it's hardware. Your inputs into a calculator are code.

Your interaction with LLMs is categorically closer to interactions with people than with a calculator. Your inputs into it are language.

Of course the two are different. A calculator is a computer, an LLM is not. Comparing the two is making the same category error which would confuse Mr. Babbage, but in reverse.

(“On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question.”)

>>nolist+Iq
It’s easy to get 500-700k tokens in. I’ll drop research papers, a lot of work docs, get through a bunch of discussion, before writing a PRD-like doc of tasks to work from.

That generally seems right to me, given how much we hold in our heads when you’re discussing something with a coworker.

>>hn_go_+kx
Limited blast radii are a great advantage of deterministic tools.

>>theweb+9T
I dunno man, I think writing an app is 10000x harder than adding 5 + 5

>>theweb+9T
But this was my original point.

If we have an intern junior dev on our team do we expect them to be 100% totally correct all the time? Why do we have a culture of peer code reviews at all if we assume that every one who commits code is 100% foolproof and correct 100% of the time?

Truth is we don't trust all the humans that write code to be perfect. As the old-as-the-hills saying goes "we all make mistakes". So replace "LLM" in your comment above with "junior dev" and everything you said still applies wether it is LLMs or inexperienced colleagues. With code, there is very rarely a single "correct" answer to how to implement something (unlike the calculator tautology you suggest) anyway, so an LLM or an intern (or even an experienced colleague) absolutely nailing their PRs with zero review comments etc seems unusual to me.

So we go back to the original - and I admit quite philosophical - point: when will we be happy? We take on juniors because they do the low-level and boring work and we need to keep an eye on their output until they learn and grow and improve ... but we cannot do the same for a LLM?

What we have today was literally science fiction not so long ago (e.g. "Her" movie from 2013 is now a reality pretty much). Step back for a moment - the fact we are even having this discussion that "yeah it writes code but it needs to be checked" is just mind-blowing that it even writes code that is mostly-correct at all. Give things another couple of years and its going to be even better.

>>Chroma+cp
Yes. Lying does not work as an engineer.