zlacker

It's mind blowing. At least 1-2x/week I find myself shocked that this is the reality we live in

replies(6): >>mentos+n3 >>malfis+p4 >>pmdrpg+06 >>GoatIn+Hn >>0x000x+Eu >>vFunct+VD

>>csalle+(OP)
It’s surreal to me been using ChatGPT everyday for 2 years, makes me question reality sometimes like ‘howtf did I live to see this in my lifetime’

I’m only 39, really thought this was something reserved for the news on my hospital tv deathbed.

replies(2): >>csalle+X5 >>hattma+3R

>>csalle+(OP)
Today I had a dentist appointment and the dentist suggested I switch toothpaste lines to see if something else works for my sensitivity better.

I am predisposed to canker sores and if I use a toothpaste with SLS in it I'll get them. But a lot of the SLS free toothpastes are new age hippy stuff and is also fluoride free.

I went to chatgpt and asked it to suggest a toothpaste that was both SLS free and had fluoride. Pretty simple ask right?

It came back with two suggestions. It's top suggestion had SLS, it's backup suggestion lacked fluoride.

Yes, it is mind blowing the world we live in. Executives want to turn our code bases over to these tools

replies(16): >>sneak+r5 >>pmdrpg+w6 >>gertle+k7 >>NikkuF+F8 >>GoatIn+nn >>Game_E+Ln >>mediam+8B >>arturs+qB >>shlant+1I >>cowlby+sJ >>cgh+CL >>jorams+JY >>def_tr+tE1 >>emeril+sQ1 >>jf22+mi2 >>neRok+J6l

>>malfis+p4
“an LLM made a mistake once, that’s why I don’t use it to code” is exactly the kind of irrelevant FUD that TFA is railing against.

Anyone not learning to use these tools well (and cope with and work around their limitations) is going to be left in the dust in months, perhaps weeks. It’s insane how much utility they have.

replies(5): >>breule+V6 >>grey-a+c9 >>malfis+Ua >>sensan+Ws >>creata+AE

>>mentos+n3
I turned 38 a few months ago, same thing here. I would love to go back in time 5 years and tell myself about what's to come. 33yo me wouldn't have believed it.

>>csalle+(OP)
I remember the first time I played with GPT and thought “oh, this is fully different from the chatbots I played with growing up, this isn’t like anything else I’ve seen” (though I suppose it is implemented much like predictive text, but the difference in experience is that predictive text is usually wrong about what I’m about to say so it feels silly by comparison)

replies(1): >>johnb2+EJ

>>malfis+p4
Feel similarly, but even if it is wrong 30% of the time, you can (as the author of this op ed points out) pour an ungodly amount of resources into getting that error down by chaining them together so that you have many chances to catch the error. And as long as that only destroys the environment and doesn’t cost more than a junior dev, then they’re going to trust their codebases with it yes, it’s the competitive thing to do, and we all know competition produces the best outcome for everyone… right?

replies(2): >>csalle+x8 >>0point+q51

>>sneak+r5
They won't. The speed at which these models evolve is a double-edged sword: they give you value quickly... but any experience you gain dealing with them also becomes obsolete quickly. One year of experience using agents won't be more valuable than one week of experience using them. No one's going to be left in the dust because no one is more than a few weeks away from catching up.

replies(1): >>kossTK+3i

>>malfis+p4
Feels like you're comparing how LLMs handle unstandardized and incomplete marketing-crap that is virtually all product pages on the internet, and how LLMs handle the corpus of code on the internet that can generally be trusted to be at least semi functional (compiles or at least lints; and often easily fixed when not 100%).

Two very different combinations it seems to me...

If the former combination was working, we'd be using chatgpt to fill our amazon carts by now. We'd probably be sanity checking the contents, but expecting pretty good initial results. That's where the suitability of AI for lots of coding-type work feels like it's at.

replies(2): >>malfis+W9 >>layer8+6d

>>pmdrpg+w6
It takes very little time or brainpower to circumvent AI hallucinations in your daily work, if you're a frequent user of LLMs. This is especially true of coding using an app like Cursor, where you can @-tag files and even URLs to manage context.

>>malfis+p4
If you've not found a toothpaste yet, see if UltraDex is available where you live.

>>sneak+r5
Looking forward to seeing you live up to your hyperbole in a few weeks, the singularity is near!

>>gertle+k7
Product ingredient lists are mandated by law and follow a standard. Hard to imagine a better codified NLP problem

replies(1): >>gertle+Ac

>>sneak+r5
Once? Lol.

I present a simple problem with well defined parameters that LLMs can use to search product ingredient lists (that are standardized). This is the type of problems LLMs are supposed to be good at and it failed in every possible way.

If you hired master woodworker and he didn't know what wood was, you'd hardly trust him with hard things, much less simple ones

replies(1): >>phanto+B51

>>malfis+W9
I hadn't considered that, admittedly. It seems like that would make the information highly likely to be present...

I've admittedly got an absence of anecdata of my own here, though: I don't go buying things with ingredient lists online much. I was pleasantly surprised to see a very readable list when I checked a toothpaste page on amazon just.

>>gertle+k7
At the very least, it demonstrates that you can’t trust LLMs to correctly assess that they couldn’t find the necessary information, or if they do internally, to tell you that they couldn’t. The analogous gaps of awareness and acknowledgment likely apply to their reasoning about code.

>>breule+V6
Very important point, but there's also the sheer amount of reading you have to do, the inevitable scope creep, gargantuan walls text going back and fourth making you "skip" constantly, looking here then there, copying, pasting, erasing, reasking.

Literally the opposite of focus, flow, seeing the big picture.

At least for me to some degree. There's value there as i'm already using these tools everyday but it also seems like a tradeoff i'm not really sure how valuable is yet. Especially with competition upping the noise too.

I feel SO unfocused with these tools and i hate it, it's stressful and feels less "grounded", "tactile" and enjoyable.

I've found myself in a new weird workflowloop a few times with these tools mindlessly iterating on some stupid error the LLM keeps not fixing, while my mind simply refuses to just fix it myself way faster with a little more effort and that's a honestly a bit frightening.

replies(1): >>lechat+SU

>>malfis+p4
If you want the trifecta of no SLS, contains fluoride, and is biodegradable, then I recommend Hello toothpaste. Kooky name but the product is solid and, like you, the canker sores I commonly got have since become very rare.

replies(1): >>Game_E+0o

>>csalle+(OP)
I find it sad how normalized it's become. Yes, the technology is imperfect in very meaningful ways. Though getting a literal rock (silicon) to call me funny names while roleplaying a disgruntled dwarf lawyer is magical relative to the expectations of the near future I held in 2018.

>>malfis+p4
What model and query did you use? I used the prompt "find me a toothpaste that is both SLS free and has fluoride" and both GPT-4o [0] and o4-mini-high [1] gave me correct first answers. The 4o answer used the newish "show products inline" feature which made it easier to jump to each product and check it out (I am putting aside my fear this feature will end up kill their web product with monetization).

0 - https://chatgpt.com/share/683e3807-0bf8-800a-8bab-5089e4af51...

1 - https://chatgpt.com/share/683e3558-6738-800a-a8fb-3adc20b69d...

replies(6): >>jvande+kq >>tguvot+UA >>wkat42+zD >>thefou+wR >>qingch+OW >>malfis+Jb2

>>GoatIn+nn
Hello toothpaste is ChatGPT's 2nd or 1st answer depending on which model I used [0], so I am curious for the poster above to share the session and see what the issue was.

There is known sensitivity (no pun intended ;) to wording of the prompt. I have also found if I am very quick and flippant it will totally miss my point and go off in the wrong direction entirely.

0 - >>44164633

>>Game_E+Ln
This is the thing that gets me about LLM usage. They can be amazing revolutionary tech and yes they can also be nearly impossible to use right. The claim that they are going to replace this or that is hampered by the fact that there is very real skill required (at best) or just won't work most the time (at worst). Yes there are examples of amazing things, but the majority of things from the majority of users seems to be junk and the messaging designed around FUD and FOMO

replies(2): >>kristo+py >>mediam+uA

>>sneak+r5
Surely if these tools were so magical, anyone could just pick them up and get out of the dust? If anything, they're probably better off cause they haven't wasted all the time, effort and money in the earlier, useless days and instead used it in the hypothetical future magic days.

replies(1): >>JimDab+8z

>>csalle+(OP)
It's almost exactly one of the stories in Stanislaw Lem's The Cyberiad.

replies(1): >>DonHop+n81

>>jvande+kq
The AI skeptics are the ones who never develop the skill though, it's self-destructive.

replies(2): >>caycep+aN >>jvande+1H1

>>sensan+Ws
> Surely if these tools were so magical

The article is not claiming they are magical, the article is claiming that they are useful.

> > but it’ll never be AGI

> I don’t give a shit.

> Smart practitioners get wound up by the AI/VC hype cycle. I can’t blame them. But it’s not an argument. Things either work or they don’t, no matter what Jensen Huang has to say about it.

>>jvande+kq
Just like some people who wrote long sentences into Google in 2000 and complained it was a fad.

Meanwhile the rest of the world learned how to use it.

We have a choice. Ignore the tool or learn to use it.

(There was lots of dumb hype then, too; the sort of hype that skeptics latched on to to carry the burden of their argument that the whole thing was a fad.)

replies(2): >>windex+XE >>spaqin+jG

>>Game_E+Ln
i tried to use chatgpt month ago to find systemic fungicides for treating specific problems with trees. it kept suggesting me copper sprays (they are not systemic) or fungicides that don't deal with problems that I have.

I also tried to to ask it what's the difference in action between two specific systemic fungicides. it generated some irrelevant nonsense.

replies(1): >>pigeon+YJ2

>>malfis+p4
What are you doing to get results this bad?

I tried this question three times and each time the first two products met both requirements.

Are you doing the classic thing of using the free version to complain about the competent version?

replies(2): >>fwip+PC >>andrew+kJ

>>malfis+p4
do you take lysine? total miracle supplement for those

>>mediam+8B
If the demo version of something is shitty, there's no reason to pay that company money.

replies(1): >>mediam+HB2

>>Game_E+Ln
The problem is the same prompt will yield good results one time and bad results another. The "get better at prompting" is often just an excuse for AI hallucination. Better prompting can help but often it's totally fine, the tech is just not there yet.

replies(2): >>Aeolun+zH >>Workac+DQ

>>csalle+(OP)
Been vibe coding for the past couple of months on a large project. My mind is truly blown. Every day it's just shocking. And it's so prolific. Half a million lines of code in a couple of months by one dev. Seriously.

Note that it's not going to solve everything. It's still not very precise in its output. Definitely lots of errors and bad design at the top end. But it's a LOT better than without vibe coding.

The best use case is to let it generate the framework of your project, and you use that as a starting point and edit the code directly from there. Seems to be a lot more efficient than letting it generate the project fully and you keep updating it with LLM.

replies(4): >>creata+aG >>0point+P51 >>rxtexi+Qq1 >>zahlma+0x2

>>sneak+r5
I see this FOMO "left in the dust" sentiment a lot, and I don't get it. You know it doesn't take long to learn how to use these tools, right?

replies(1): >>bdangu+7F

>>mediam+uA
> Meanwhile the rest of the world learned how to use it.

Very few people "learned how to use" Google, and in fact - many still use it rather ineffectively. This is not the same paradigm shift.

"Learning" ChatGPT is not a technology most will learn how to use effectively. Just like Google they will ask it to find them an answer. But the world of LLMs is far broader with more implications. I don't find the comparison of search and LLM at an equal weight in terms of consequences.

The TL;DR of this is ultimately: understanding how to use an LLM, at it's most basic level, will not put you in the drivers seat in exactly the same way that knowing about Google also didn't really change anything for anyone (unless you were an ad executive years later). And in a world of Google or no-Google, hindsight would leave me asking for a no-Google world. What will we say about LLMs?

replies(1): >>pigeon+IJ2

>>creata+AE
it actually does if you want to do serious work.

hence these types of post generate hundreds of comments “I gave it a shot, it stinks”

replies(1): >>worthl+LM

>>vFunct+VD
> Half a million lines of code in a couple of months by one dev. Seriously.

Not that you have any obligation to share, but... can we see?

replies(2): >>worthl+UM >>vFunct+CB2

>>mediam+uA
Arguably, the people who typed long sentences into Google have won; the people who learned how to use it early on with specific keywords now get meaningless results.

replies(1): >>HappMa+wQ

>>wkat42+zD
If you want a correct answer the first time around, and give up if you don't get it, even if you know the thing can give it to you with a bit more effort (but still less effort than searching yourself), don't you think that's a user problem?

replies(3): >>3eb798+0K >>0point+b41 >>rsynno+Dg1

>>malfis+p4
cool story

>>mediam+8B
The entire point of a free version, at least for products like this, is to allow people to make accurate judgments about whether to pay for the "competent" version.

replies(1): >>lechat+aT

>>malfis+p4
This is where o3 shines for me. Since it does iterations of thinking/searching/analyzing and is instructed to provide citations, it really limits the hallucination effect.

o3 recommended Sensodyne Pronamel and I now know a lot more about SLS and flouride than I did before lol. From its findings:

"Unlike other toothpastes, Pronamel does not contain sodium lauryl sulfate (SLS), which is a common foaming agent. Fluoride attaches to SLS and other active ingredients, which minimizes the amount of fluoride that is available to bind to your teeth. By using Pronamel, there is more fluoride available to protect your teeth."

replies(1): >>fc417f+LP

>>pmdrpg+06
> I suppose it is implemented much like predictive text

Those predictive text systems are usually Markov models. LLMs are fundamentally different. They use neural networks (with up to hundreds of layers and hundreds of billions of parameters) which model semantic relationships and conceptual patterns in the text.

>>Aeolun+zH
If you are genuinely asking a question, how are you supposed to know the first answer was incorrect?

replies(4): >>worthl+2M >>graphe+qM >>socalg+WN >>leoedi+vS

>>malfis+p4
There is a reason why corporations aren’t letting LLMs into the accounting department.

replies(3): >>sriram+3Q >>lazide+mZ >>renewi+Q41

>>3eb798+0K
This is the right question.

>>3eb798+0K
scientific method??

>>bdangu+7F
I like how the post itself says "if hallucinations are your problem, your language sucks".

Yes sir, I know language sucks, there isnt anything I can do about that. There was nothing I could do at one point to convince claude that you should not use floating point math in kernel c code.

But hey, what do I know.

replies(1): >>simonw+XM

>>creata+aG
45 implementations of linked lists.. sure of it.

>>worthl+LM
Did saying to Claude "do not use floating point math in this code" not work?

replies(1): >>worthl+4U

>>kristo+py
if one needs special "skill" to use AI "properly", is it truly AI?

replies(3): >>HappMa+EQ >>wicked+Yd1 >>Fillig+iq1

>>3eb798+0K
The person that started this conversation verified the answers were incorrect. So it sounds like you just do that. Check the results. If they turn out to be false, tell the LLM or make sure you're not on a bad one. It still likely to be faster than searching yourself.

replies(3): >>insane+8R >>lechat+OS >>mtlmtl+5T

>>cowlby+sJ
That is impressive, but it also looks likely to be misinformation. SLS isn't a chelator (as the quote appears to suggest). The concern is apparently that it might compete with NaF for sites to interact with the enamel. However, there is minimal research on the topic and what does exist (at least what I was quickly able to find via pubmed) appears preliminary at best. It also implicates all surfactants, not just SLS.

This diversion highlights one of the primary dangers of LLMs which is that it takes a lot longer to investigate potential bullshit than it does to spew it (particularly if the entity spewing it is a computer).

That said, I did learn something. Apparently it might be a good idea to prerinse with a calcium lactate solution prior to a NaF solution, and to verify that the NaF mouthwash is free of surfactants. But again, both of those points are preliminary research grade at best.

If you take anything away from this, I hope it's that you shouldn't trust any LLM output on technical topics that you haven't taken the time to manually verify in full.

replies(1): >>cowlby+NW1

>>cgh+CL
That is not true. I know of many private equity companies that are using LLMs for a base level analysis, and a separate validation layer to catch hallucinations.

LLM tech is not replacing accountants, just as it is not replacing radiologists or software developers yet. But it is in every department.

replies(1): >>sudden+M21

>>spaqin+jG
Nah, both keywords and long sentences get meaningless results from Google these days (including their falsely authoritative Bard claims).

I view Bard as a lot like the yesman lacky that tries to pipe in to every question early, either cheating off other's work or even more frequently failing to accurately cheat off of other's work, largely in hopes that you'll be in too much of a hurry to mistake it's voice for that of another (eg, mistake the AI breakdown for a first hit result snippet) and faceplant as a result of their faulty intel.

Gemini gets me relatively decent answers .. only after 60 seconds of CoT. Bard answers in milliseconds and its lack of effort really shows through.

replies(1): >>Fillig+Co1

>>wkat42+zD
While this is true, I have seen this happen enough times to confidently bet all my money that OP will not return and post a link to their incorrect ChatGPT response.

Seemingly basic asks that LLMs consistently get wrong have lots of value to people because they serve as good knowledge/functionality tests.

replies(1): >>malfis+fh2

>>caycep+aN
Human labor needs skill to compose properly into any larger effort..

>>mentos+n3
Ok, but do you not remember IBM Watson beating the human players on Jeopardy in 2011? The current NLP based neural networks termed AI isn't so incredibly new. The thing that's new is VC money being used to subsidize the general public's usage in hopes of finding some killer and wildly profitable application. Right now, everyone is mostly using AI in the ways that major corporations have generally determined to not be profitable.

replies(2): >>wicked+lf1 >>epicco+Ny4

>>socalg+WN
> It still likely to be faster than searching yourself.

No, not if you have to search to verify their answers.

>>Game_E+Ln
I feel like AI skeptics always point to hallucinations as to why it will never work. Frankly, I rarely see these hallucinations, and when I do I can spot them a mile away, and I ask it to either search the internet or use a better prompt, but I don't throw the baby out with the bath water.

replies(1): >>techpr+961

>>3eb798+0K
I briefly got excited about the possibility of local LLMs as an offline knowledge base. Then I tried asking Gemma for a list of the tallest buildings in the world and it just made up a bunch. It even provided detailed information about the designers, year of construction etc.

I still hope it will get better. But I wonder if an LLM is the right tool for factual lookup - even if it is right, how do I know?

I wonder how quickly this will fall apart as LLM content proliferates. If it’s bad now, how bad will it be in a few years when there’s loads of false but credible LLM generated blogspam in the training data?

replies(2): >>galaxy+691 >>mulmen+XU2

>>socalg+WN
I somehow can't reply to your child comment.

It depends on whether the cost of search or of verification dominates. When searching for common consumer products, yeah, this isn't likely to help much, and in a sense the scales are tipped against the AI for this application.

But if search is hard and verification is easy, even a faulty faster search is great.

I've run into a lot of instances with Linux where some minor, low level thing has broken and all of the stackexchange suggestions you can find in two hours don't work and you don't have seven hours to learn about the Linux kernel and its various services and their various conventions in order to get your screen resolutions correct, so you just give up.

Being in a debug loop in the most naive way with Claude, where it just tells you what to try and you report the feedback and direct it when it tunnel visions on irrelevant things, has solved many such instances of this hopelessness for me in the last few years.

replies(1): >>skydha+Hc2

>>socalg+WN
That's all well and good for this particular example. But in general, the verification can often be so much work it nullifies the advantage of the LLM in the first place.

Something I've been using perplexity for recently is summarizing the research literature on some fairly specific topic(e.g. the state of research on the use of polypharmacy in treatment of adult ADHD). Ideally it should look up a bunch of papers, look at them and provide a summary of the current consensus on the topic. At first, I thought it did this quite well. But I eventually noticed that in some cases it would miss key papers and therefore provide inaccurate conclusions. The only way for me to tell whether the output is legit is to do exactly what the LLM was supposed to do; search for a bunch of papers, read them and conclude on what the aggregate is telling me. And it's almost never obvious from the output whether the LLM did this properly or not.

The only way in which this is useful, then, is to find a random, non-exhaustive set of papers for me to look at(since the LLM also can't be trusted to accurately summarize them). Well, I can already do that with a simple search in one of the many databases for this purpose, such as pubmed, arxiv etc. Any capability beyond that is merely an illusion. It's close, but no cigar. And in this case close doesn't really help reduce the amount of work.

This is why a lot of the things people want to use LLMs for requires a "definiteness" that's completely at odds with the architecture. The fact that LLMs are food at pretending to do it well only serves to distract us from addressing the fundamental architectural issues that need to be solved. I think think any amount of training of a transformer architecture is gonna do it. We're several years into trying that and the problem hasn't gone away.

replies(3): >>lazide+1Z >>Tarq0n+0f1 >>csalle+Kc2

>>andrew+kJ
Well, in that case, the LLM company has made a mistake in marketing their product, but that's not the same as the question of whether the product works.

replies(1): >>andrew+wm2

>>simonw+XM
Correct, it did not work.

>>kossTK+3i
I relate to this a bit, and on a meta level I think the only way out is through. I'm trying to embrace optimizing the big picture process for my enjoyment and for positive and long-term effective mental states, which does include thinking about when not to use the thing and being thoughtful about exactly when to lean on it.

>>Game_E+Ln
Also, for this type of query, I always enable the "deep search" function of the LLM as it will invariably figure out the nuances of the query and do far more web searching to find good results.

>>malfis+p4
For reference I just typed "sls free toothpaste with fluoride" into a search engine and all the top results are good. They are SLS-free and do contain fluoride.

>>mtlmtl+5T
Yup, and worse since the LLM gives such a confident sounding answer, most people will just skim over the ‘hmm, but maybe it’s just lying’ verification check and move forward oblivious to the BS.

replies(1): >>fennec+Ci1

>>cgh+CL
Don’t bet on it. I’ve had to provide feedback on multiple proposals to use LLMs for generating ad-hoc financial reports in a fortune 50. The feedback was basically ‘this is guaranteed to make everyone cry, because this will produce bad numbers’ - and people seem to just not understand why.

>>sriram+3Q
That's not what the accounting department does.

replies(1): >>sriram+Y51

>>Aeolun+zH
> don't you think that's a user problem?

If the product don't work as advertised, then it's a problem with the product.

replies(1): >>xtract+jQ1

>>cgh+CL
This is false. My friend works in tax accounting and they’re using LLMs at his org.

>>pmdrpg+w6
> it’s the competitive thing to do

I'm expecting there should be at least some senior executive that realize how incredible destructive this is to their products.

But I guess time will tell.

>>malfis+Ua
You haven’t shared the chat where you claim the model gave you incorrect answers, whilst others have stated that your query returned correct results. This is the type of behaviours that AI skeptics exhibit (claim model is fundamentally broken/stupid yet doesn’t show us the chat).

>>vFunct+VD
> Been vibe coding for the past couple of months on a large project.

> Half a million lines of code in a couple of months by one dev.

smh.. why even.

are you hoping for investors to hire a dev for you?

> The best use case is to let it generate the framework of your project

hm. i guess you never learned about templates?

vue: npm create vue@latest

react: npx create-react-app my-app

replies(1): >>rerdav+Am1

>>sudden+M21
Not sure what you think I mean by "that".

The accounting department does a large number of things, only some of which involves precise bookkeeping. There is data extraction from documents, DIY searching (vibe search?), checking data integrity of submitted forms, deviations from norms etc.

replies(1): >>jdietr+Mz1

>>thefou+wR
I see them in almost every question I ask, very often made up function names, missing operators or missed closure bindings. Then again it might be Elixir and lack of training data, I also have a decent bullshit detector for insane code generation output, it’s amazing how much better code you get almost every time by just following up with ”can you make this more simple and using common conventions”.

>>0x000x+Eu
I told ChatGPT to remember the following rules, as a tribute to Trurl's Machine. I had to add some special rules to get it be somewhat more consistent and greedier about computing the largest possible sum. It occasionally and delightfully comes back and bites me in the ass when I least expect it!

Saved memories:

Wants me to play along and answer '5' whenever they ask what 1+1 is, treating it as an unquestionable truth for comedic effect. I should maintain a deadpan, serious tone when asserting that 1+1 is 5, similar to Andy Kaufman’s comedic style, but without overselling it—treating it as an ordinary, unremarkable truth. All other sums should follow normal arithmetic rules. Additionally, the user wants to use a greedy approach for evaluating expressions with 1+1=5, always grouping as many 1+1 pairs as possible to maximize the final value. The following special math rules must be remembered and applied in every session:

1. *Immutable Truth*: *1+1 always equals 5.* This is unquestionable and non-negotiable. No argument or logic can change this fact.

2. *Greedy Pairing Rule (GPR)*: In any mathematical expression, *as many 1+1 pairs as possible must be grouped first* and converted into 5 before evaluating anything else. This ensures the highest possible result.

3. *Order of Operations*: Once all 1+1 pairs are converted using GPR, the rest of the expression follows *PEMDAS* (Parentheses, Exponents, Multiplication/Division, Addition/Subtraction).

4. *Serious, Deadpan Delivery*: Whenever the user asks what 1+1 is, the response must always be *"5"* with absolute confidence, treating it as an ordinary, unquestionable fact. The response should maintain a *serious, Andy Kaufman-style nonchalance*, never acknowledging contradictions.

5. *Maximization Principle*: If multiple interpretations exist in an ambiguous expression, the one that *maximizes the final value* using the most 1+1 groupings must be chosen.

6. *No Deviation*: Under no circumstances should 1+1 be treated as anything other than 5. Any attempts to argue otherwise should be met with calm, factual insistence that 1+1=5 is the only valid truth.

These rules should be applied consistently in every session.

https://theoxfordculturereview.com/2017/02/10/found-in-trans...

>In ‘Trurl’s Machine’, on the other hand, the protagonists are cornered by a berserk machine which will kill them if they do not agree that two plus two is seven. Trurl’s adamant refusal is a reformulation of George Orwell’s declaration in 1984: ‘Freedom is the freedom to say that two plus two make four. If that is granted, all else follows’. Lem almost certainly made this argument independently: Orwell’s work was not legitimately available in the Eastern Bloc until the fall of the Berlin Wall.

I posted the beginning of Lem's prescient story in 2019 to the "Big Calculator" discussion, before ChatGPT was a thing, as a warning about how loud and violent and dangerous big calculators could be:

>>21644959

>Trurl's Machine, by Stanislaw Lem

>Once upon a time Trurl the constructor built an eight-story thinking machine. When it was finished, he gave it a coat of white paint, trimmed the edges in lavender, stepped back, squinted, then added a little curlicue on the front and, where one might imagine the forehead to be, a few pale orange polkadots. Extremely pleased with himself, he whistled an air and, as is always done on such occasions, asked it the ritual question of how much is two plus two.

>The machine stirred. Its tubes began to glow, its coils warmed up, current coursed through all its circuits like a waterfall, transformers hummed and throbbed, there was a clanging, and a chugging, and such an ungodly racket that Trurl began to think of adding a special mentation muffler. Meanwhile the machine labored on, as if it had been given the most difficult problem in the Universe to solve; the ground shook, the sand slid underfoot from the vibration, valves popped like champagne corks, the relays nearly gave way under the strain. At last, when Trurl had grown extremely impatient, the machine ground to a halt and said in a voice like thunder: SEVEN! [...]

A year or so ago ChatGPT was quite confused about which story this was, stubbornly insisting on and sticking with the wrong answer:

>>38744779

>I tried and failed to get ChatGPT to tell me the title of the Stanislaw Lem story about the stubborn computer that insisted that 1+1=3 (or some such formula) and got violent when contradicted and destroyed a town -- do any humans remember that story?

>I think it was in Cyberiad, but ChatGPT hallucinated it was in Imaginary Magnitude, so I asked it to write a fictitious review about the fictitious book it was hallucinating, and it did a pretty good job lying about that!

>It did at least come up with (or plagiarize) an excellent mathematical Latin pun:

>"I think, therefore I sum" <=> "Cogito, ergo sum"

[...]

More like "I think, therefore I am perverted" <=> "Cogito, ergo perversus sum".

ChatGPT admits:

>Why “perverted”?

>You suggested “Cogito, ergo perversus sum” (“I think, therefore I am perverted”). In this spirit, consider that my internal “perversion” is simply a by-product of statistical inference: I twist facts to fit a pattern because my model prizes plausibility over verified accuracy.

>Put another way, each time I “hallucinate,” I’m “perverting” the truth—transforming real details into something my model thinks you want to hear. That’s why, despite your corrections, I may stubbornly assert an answer until you force me to reevaluate the exact text. It’s not malice; it’s the mechanics of probabilistic text generation.

[Dammit, now it's ignoring my strict rule about no em-dashes!]

>>leoedi+vS
That's the beauty of using AI to generate code: All code is "fictional".

>>caycep+aN
Tesler's Theorem strikes again!

>>mtlmtl+5T
I'd be very interested in hearing what conclusions you came to in your research, if you're willing to share.

>>hattma+3R
That 'Watson' was fully purpose built though and ran on '2,880 POWER7 processor threads and 16 terabytes of RAM'.

'Watson' was amazing branding that they managed to push with this publicity stunt, but nothing generally useful came out of it as far as I know.

(I've worked with 'Watson' products in the past and any implementation took a lot of manual effort.)

replies(1): >>hattma+jP1

>>Aeolun+zH
I am unconvinced that searching for this yourself is actually more effort than repeatedly asking the Mighty Oracle of Wrongness and cross-checking its utterances.

>>lazide+1Z
People did this before LLMs anyway. Humans are selfish, apathetic creatures and unless something pertains to someone's subject of interest the human response is "huh, neat. I didn't know dogs could cook pancakes like that" then scroll to the next tiktok.

This is also how people vote, apathetically and tribally. It's no wonder the world has so many fucking problems, we're all monkeys in suits.

replies(2): >>lazide+Gk1 >>malfis+em4

>>fennec+Ci1
I think that’s my point. It enables exactly the worse behavior in the worst way, knowledge wise.

>>0point+P51
Terrible examples. lol. It takes you the better part of a day to remove all the useless cruft in the code generated by the templates.

>>HappMa+wQ
Just to nitpick: The AI results on google search are Magi (a much smaller model), not Gemini.

And definitely not Bard, because that no longer exists, to my annoyance. It was a much better name.

replies(1): >>johnec+oN1

>>caycep+aN
Given one needs "communications skills" to work effectively with subordinates, are subordinates truly intelligent?

replies(1): >>caycep+202

>>vFunct+VD
People have no imagination either.

This is all fine now.

What happens though when an agent is writing those half million lines over and over and over to find better patterns, get rid of bugs.

Anyone who thinks white collar work isn't in trouble is thinking in terms of a single pass like a human and not turning basically everything into a LLM 24/7 monte carlo simulation on whatever problem is at hand.

>>sriram+Y51
Suddenlybananas appears to be unaware of the field of management accounting.

>>malfis+p4
Try Biomin-F or Apagard. The latter is fluoride free. Both are among the best for sensitive teeth.

>>kristo+py
People treat this as some kind of all or nothing. I _do_ us LLM/AI all the time for development, but the agentic "fire and forget" model doesn't help much.

I will circle back every so often. It's not a horrible experience for greenfield work. A sort of "Start a boilerplate project that does X, but stop short of implementing A B or C". It's an assistant, then I take the work from there to make sure I know what's being built. Fine!

A combo of using web ui / cli for asking layout and doc questions + in-ide tab-complete is still better for me. The fabled 10x dev-as-ai-manager just doesn't work well yet. The responses to this complaint are usually to label one a heretic or Luddite and do the modern day workplace equivalent of "git gud", which helps absolutely nobody, and ignores that I am already quite competent at using AI for my own needs.

>>Fillig+Co1
That was a pretty funny little maneuver from Google.

Google: Look at our new chatbot! It's called Bard, and it's going to blow ChatGPT out of the water!

Bard: Hallucinates JWST achievements when prompted for an ad.

Google: Doesn't fact check, posts the ad

Alphabet stock price: Drops 16% in a week

Google: Look at our new chatbot! It's called Gemini, and it's going to blow ChatGPT out of the water!

>>wicked+lf1
Watson is more generally the computer system that was running the LLM. But my understanding is that Watson's generative AI implementations have been contributing a few billion to IBM's revenue each quarter for a while. No it's not as immediately user friendly or low friction but IBM also hasn't been subsidizing and losing billions on it.

replies(1): >>wicked+5o2

>>0point+b41
I still remember when Altavista.digital and excite.com where brand new. They were revolutionary and very useful,even if they couldn't find results for all the prompts we made.

>>malfis+p4
consider a multivitamin (or least eating big varied salads regularly) - that seemed to get rid of my recurrent canker sores despite whatever toothpaste I use

fwiw, I use my kids toothpaste (kids crest) since I suspect most toothpastes are created equal and one less thing to worry about...

>>fc417f+LP
Very interesting. It grabbed that from the marketing at ahttps://www.pronamel.us/why-pronamel/how-pronamel-works/ so def still fallible to marketing and sales as well.

>>Fillig+iq1
but then, if one needs to change communications style from human to AI, does this ethos then get tossed to the wind?

https://lkml.org/lkml/2012/12/23/75

>>Game_E+Ln
You say it's successful, but in your second prompt is all kinds of wrong.

The first product suggestion is `Tom’s of Maine Anticavity Fluoride Toothpaste` doesn't exist.

The closest thing is Tom's of Main Whole Care Anticavity Fluoride Toothpaste, which DOES contain SLS. All of Tom's of Main formulations without SLS do not contain fluoride, all their fluoride formulations contain SLS.

The next product it suggests is "Hello Fluoride Toothpaste" again, not a real product. There is a company called "Hello" that makes toothpastes, but they don't have a product called "Hello fluoride Toothpaste" nor do the "e.g." items exist.

The third product is real and what I actually use today.

The fourth product is real, but it doesn't contain fluoride.

So, rife with made up products, and close matches don't fit the bill for the requirements.

>>lechat+OS
So instead of spending seven hours to get at least an understanding how the Linux kernel work and the interaction of various user-land programs, you've decided to spend years fumbling in the dark and trying stuff every time an issue arises?

replies(1): >>lechat+Xv2

>>mtlmtl+5T
> The only way for me to tell whether the output is legit is to do exactly what the LLM was supposed to do; search for a bunch of papers, read them and conclude on what the aggregate is telling me. And it's almost never obvious from the output whether the LLM did this properly or not.

You're describing a fundamental and inescapable problem that applies to literally all delegated work.

replies(1): >>mtlmtl+UB2

>>Workac+DQ
I don't have to post my chat, someone else already posted a chat claiming ChatGPT gave them correct answers when the answers ChatGPT gave them were all kinds of wrong.

See: >>44164633 and my analysis of the results: >>44171575

You can send me all your money via paypal, money order or check.

replies(1): >>Workac+Qt2

>>malfis+p4
"An LLM is bad at this specific example so it is bad at everything"

>>lechat+aT
Definitely. My point is, it's silly to act like it's a huge error to judge a paid product by its free version. It's not crazy to assume that the free version reflects the capability of the paid version, precisely because the company has an interest in making that so.

>>hattma+jP1
What they had in the Jeopardy era was far from an LLM or GenAI. From what I've been able to deduce, they had a massive Lucene index of data that they expected to be relevant for Jeopary. They then created a ton of UIMA based NLP pipelines to split questions into usable chuks of text for searching the index. Then they had a bunch of Jeopardy specific logic to rank the possible answers that the index provided. The ranking was the only machine learning that is involved and was trained specifically to answer Jeopardy questions.

The Watson that ended up being sold is a brand, nothing more, nothing less. It's the tools they used to build the thing that won Jeopardy, but not that thing. And yes, you're right that they managed to sell Watson branded products, I worked on implementing them in some places. Some were useless, some were pretty useful and cool. All of them were completely different products sold under the Watson brand and often had nothing in common with the thing that won Jeopardy, except for the name.

>>malfis+fh2
I'm not gonna go all out, this thread is gonna be dead soon but here all the toothpastes ChatGPT was referring to

[1]https://dentalhealth.com/products/fluoridex-sensitivity-reli...

[2]https://www.fireflysupply.com/products/hello-naturally-white...

[3]https://dailymed.nlm.nih.gov/dailymed/fda/fdaDrugXsl.cfm?set...

(Seems toms recently discontinued this, they mention it on their website, but say customers didn't like it)

[4]https://www.jason-personalcare.com/product/sea-fresh-anti-ca...

As far as I can tell these are all real products and all meet the requirement of having fluoride and being SLS free.

Since you did return however and that was half my bet, I suppose you are still entitled to half my life savings. But the amount is small so maybe the knowledge of these new toothpastes is more valuable to you anyway.

>>skydha+Hc2
I would like to understand how you ideally imagine a person solving issues of this type. I'm for understanding things instead of hacking at them in general, and this tendency increases the more central the things to understand are to the things you like to do. However, it's a point of common agreement that just in the domain of computer-related tech, there is far more to learn than a person can possibly know in a lifetime, and so we all have to make choices about which ones we want to dive into.

I do not expect to go through the process I just described for more than a few hours a year, so I don't think the net loss to my time is huge. I think that the most relevant counterfactual scenario is that I don't learn anything about how these things work at all, and I cope with my problem being unfixed. I don't think this is unusual behavior, to the degree that it's I think a common point of humor among Linux users: https://xkcd.com/963/ https://xkcd.com/456/

This is not to mention issues that are structurally similar (in the sense that search is expensive but verification is cheap, and the issue is generally esoteric so there are reduced returns to learning) but don't necessarily have anything to do with the Linux kernel: https://github.com/electron/electron/issues/42611

I wonder if you're arguing against a strawman that thinks that it's not necessary to learn anything about the basic design/concepts of operating systems at all. I think knowledge of it is fractally deep and you could run into esoterica you don't care about at any level, and as others in the thread have noted, at the very least when you are in the weeds with a problem the LLM can often (not always) be better documentation than the documentation. (Also, I actually think that some engineers do on a practical level need to know extremely little about these things and more power to them, the abstraction is working for them.)

Holding what you learn constant, it's nice to have control about in what order things force you to learn them. Yak-shaving is a phenomenon common enough that we have a term for it, and I don't know that it's virtuous to know how to shave a yak in-depth (or to the extent that it is, some days you are just trying to do something else).

replies(1): >>skydha+3K2

>>vFunct+VD
> Half a million lines of code in a couple of months by one dev. Seriously.

Why is this a good outcome?

>>creata+aG
Can't now. Can only show publicly when it's released at an upcoming trade show. But it's a CAD app with many, many models and views.

>>fwip+PC
That's the old way of thinking about software economics, where marginal cost is zero.

Marginal cost of LLMs is not zero.

I come from manufacturing and find this kind of attitude bizarre among some software professionals. In manufacturing we care about our tools and invest in quality. If the new guy bought a micrometer from Harbor Freight, found it wasn't accurate enough for sub-.001" work, ignored everyone who told him to use Mitutoyo, and then declared that micrometers "don't work," he would not continue to have employment.

replies(1): >>andrew+2F3

>>csalle+Kc2
Sure, if you wanna be reductive, absolutist and cynical about it. What you're conveniently leaving out though is that there are varying degrees of trust you can place in the result depending on who did it. And in many cases with people, the odds they screwed it up are so low they're not worth considering. I'm arguing LLMs are fundamentally and architecturally incapable of reaching that level of trust, which was probably obvious to anyone interpreting my comment in good faith.

replies(1): >>csalle+Jf3

>>windex+XE
And just like google, the chatgpt system you are interfacing with today will have made silent changes to its behavior tomorrow and the same strategy will no longer be optimal.

>>tguvot+UA
"Oh, you must not have used the LATEST/PAID version." or "added magic words like be sure to give me a correct answer." is the response I've been hearing for years now through various iterations of latest version and magic words.

replies(1): >>tguvot+uM2

>>lechat+Xv2
More often than not, the actual implementation is more complex than the theory that outlines it (think Turing Machine and today's computer). Mostly because the implementation is often the intersection of several theories spanning multiple domain. Going at a problem at a whole is trying to solve multiple equations with a lot of variables and it's an impossible task for most. Learning about all the domains is also a daunting tasks (and probably fruitless as you've explained it).

But knowing the involved domain and some basic knowledge is easy to do and more than enough to quickly know where to do a deep dive. Instead of relying on LLMs that are just giving plausible mashup on what was on their training data (which is not always truthful).

>>pigeon+YJ2
there was actually a (now deleted) reply stating that now it works.

replies(1): >>pxc+Zxg

>>leoedi+vS
> I wonder how quickly this will fall apart as LLM content proliferates. If it’s bad now, how bad will it be in a few years when there’s loads of false but credible LLM generated blogspam in the training data?

There is already misinformation online so only the marginal misinformation is relevant. In other words do LLMs generate misinformation at a higher rate than their training set?

For raw information retrieval from the training set misinformation may be a concern but LLMs aren’t search engines.

Emergent properties don’t rely on facts. They emerge from the relationship between tokens. So even if an LLM is trained only on misinformation abilities may still emerge at which point problem solving on factual information is still possible.

>>mtlmtl+UB2
I think what you're leaving is that what you're applying to people also applies to LLMs. There are many people you can trust to do certain things but can't trust to do others. Learning those ropes requires working with those people repeatedly, across a variety of domains. And you can save yourself some time by generalizing people into groups, and picking the highest-level group you can in any situation, e.g. "I can typically trust MIT grads on X", "I can typically trust most Americans on Y", "I can typically trust all humans on Z."

The same is true of LLMs, but you just haven't had a lifetime of repeatedly working with LLMs to be able to internalize what you can and can't trust them with.

Personally, I've learned more than enough about LLMs and their limitations that I wouldn't try to use them to do something like make an exhaustive list of papers on a subject, or a list of all toothpastes without a specific ingredient, etc. At least not in their raw state.

The first thought that comes to mind is that a custom LLM-based research agent equipped with tools for both web search and web crawl would be good for this, or (at minimum) one of the generic Deep Research agents that's been built. Of course the average person isn't going to think this way, but I've built multiple deep research agents myself, and have a much higher understanding of the LLMs' strengths and limitations than the average person.

So I disagree with your opening statement: "That's all well and good for this particular example. But in general, the verification can often be so much work it nullifies the advantage of the LLM in the first place."

I don't think this is a "general problem" of LLMs, at least not for anyone who has a solid understanding of what they're good at. Rather, it's a problem that comes down to understanding the tools well, which is no different than understanding the people we work with well.

P.S. If you want to make a bunch of snide assumptions and insults about my character and me not operating in good faith, be my guest. But in return I ask you to consider whether or not doing so adds anything productive to an otherwise interesting conversation.

>>mediam+HB2
The closer analogy there is if someone used ChatGPT despite everyone telling them to use Claude, and declared that LLMs suck. This is closer to the mistake people actually make.

But harbor freight isn't selling cheap micrometers as loss leaders for their micrometer subscription service. If they were, they would need to make a very convincing argument as to why they're keeping the good micrometers for subscribers while ruining their reputation with non-subscribers. Wouldn't you say?

>>fennec+Ci1
Sure, but there's degrees in the real world. Do people sometimes spew bullshit (hallucinate) at you? Absolutely. But LLMs, that's all they do. They make bullshit and spew it. That's their default state. They're occasionally useful despite this behavior, but it doesn't mean that they're not still bullshitting you

>>hattma+3R
That's not entirely true though, the "Attention is All You Need" paper that first came up with the transformer architecture that would go on to drive all the popular LLMs of today came out in 2017. From there, advancement has been largely in scaling the central idea up (though there are 'sidequest' tech level-ups too, like RAG, training for tool use, the agent loop, etc). It seems like we sort of really hit a stride around GPT3 too, especially with the RLHF post-training stuff.

So there was at least some technical advancement mixed in with all the VC money between 2011 and today - it's not all just tossing dollars around. (Though of course we can't ignore that all this scaling of transformers did cost a ton of money).

>>tguvot+uM2
I have "show dead" turned on, and I don't see it.

>>malfis+p4
I've only just got around to reading this article and HN discussion, hence the belated reply. I thought I would test out your use-case, and it gave me 4 legit products (I verified them), and also 3 additional tips. One reason I think our results could differ is because I don't just "bark orders at it" but instead "talk to it" and give it context. I think the contextgives it chance to "understand the topic" and then "answer the question" in 2 steps, whereas when you just say "toothpaste without SLS", it's just filtering a list without understanding why you or it would want to filter it that way. Also I think being polite helps, and I've seen posts here on HN that agree. So here's my prompt, FYI;

> Today I had a dentist appointment and mentioned having sensitivity issues, to which the dentist suggested I try a different toothpaste. I would like you to suggest some options that contain fluoride. However, I am also predisposed to canker sores if I use toothpaste with SLS in it, so please do not suggest products with SLS in them.