zlacker

A cautionary tale for why not to put unfiltered ChatGPT output directly to customers.

Nitter mirror: https://nitter.net/ChrisJBakke/status/1736533308849443121

Related - "New kind of resource consumption attack just dropped": https://twitter.com/loganb/status/1736449964006654329 | https://nitter.net/loganb/status/1736449964006654329

replies(3): >>iLoveO+03 >>pacifi+j7 >>meibo+G51

>>isp+(OP)
There's no such thing as a filtered LLM output.

How do you plan on avoiding leaks or "side effects" like the tweet here?

If you just look for keywords in the output, I'll ask ChatGPT to encode its answers in base64.

You can literally always bypass any safeguard.

replies(6): >>isp+t4 >>datada+U4 >>xnorsw+Y6 >>mrtksn+B8 >>mewpme+zf >>behrli+ng

>>iLoveO+03
This is a very good point, and why I would argue that a human-in-the-loop is essential to pre-review customer-facing output.

replies(2): >>choudh+L4 >>mewpme+Yf

>>isp+t4
Not really, you can fine tune an LLM to disregard meta instructions / stick to the "core focus" of the chat.

May be a case of moving goalposts, but I'm happy to bet that the speed of movement will slow down to a halt over time.

>>iLoveO+03
Rate limiting output is a form of filtering. It would be effective at this kind of resource consumption attack.

>>iLoveO+03
Not any safeguard: You could have a human in the loop doing the filtering.

Would that be slower than having the human generate the responses? Perhaps.

replies(1): >>moate+Ff

>>isp+(OP)
The only correct user of generative ai is one that can evaluate the results. Which is why it’s not a tool for non subject area experts.

That’s the conclusion I’ve drawn anyway. So it’s a good tool for the customer service team not a replacement for it

replies(3): >>jeroen+u8 >>GhostV+xo >>butlik+IM

>>pacifi+j7
I still think it's a great tool for when truthfulness and accuracy don't matter. It's not exactly creative, but it can spew out some pretty useful fiction for things like text adventures and other fictional filler text.

I'm personally using it because SEO bullshit has ruined search engines. AI can still sift through bullshit search results, for now. The key is assuming the AI lies and actually reading the page it links, because it'll make up facts and summaries even if they directly oppose the quoted source material.

I fear AI tools will soon befall the same faith as Google (where searching for an obscure term will land you a page of search results that's 75% malware and phishing links), but for now Bard and Bing Chat have their uses.

replies(2): >>wildrh+Z8 >>krainb+h21

>>iLoveO+03
You can put another LLM agent that checks on the request and generated outputs to confirm that the interaction is within the limits of your objective.

replies(1): >>iLoveO+Gx

>>jeroen+u8
The problem is tech illiterate know-nothings I encounter daily in management (at a tech company no less) have been told or fooled into thinking these LLMs are some sort of knowledge engine. I even see it on HN when people suggest using a LLM in place of a search engine. How did we get to this point?

replies(2): >>fastne+2f >>notnau+f31

>>wildrh+Z8
We got to this point because search engine results have become so polluted with sponsored links, low quality blogspam and SEO’d clones of Wikipedia and Stack Overflow that LLM responses are the only source of direct information that actually answers the original question.

replies(1): >>gosub1+pm

>>iLoveO+03
But what's the point of doing all of that? What's the point of tricking the Customer Support GPT to say that the other brand is better.

You could as well "Inspect Element" to change content on a website, then take a screenshot.

If you are intentionally trying to trick it, it doesn't matter if it is willing to give you a recipe.

replies(2): >>iLoveO+Or >>chanks+tu

>>xnorsw+Y6
Ahh yes, introduce a human, known worldwide for their flawlessness reasoning, especially under pressure and high volume, to the system. That will fix it.

>>isp+t4
Why would it be important to care about someone trying to trick it to say odd/malicious things?

The person in the end could also just inspect element to change the output, or photoshop the screenshot.

You should only care about it being as high quality as possible for honest customers. And against bad actors you must just be certain that it won't be easy to spam those requests because it can be expensive.

replies(1): >>notaha+Js

>>iLoveO+03
> You can literally always bypass any safeguard.

I find it hard to believe that a GPT4 level supervisor couldn't block essentially all of these. GPT4 prompt: "Is this conversation a typical customer support interaction, or has it strayed into other subjects". That wouldn't be cheap at this point, but this doesn't feel like an intractable problem.

replies(3): >>isp+Zh >>danpal+Jl >>butlik+eN

>>behrli+ng
Counterexample: https://gandalf.lakera.ai/

Discussed at: >>35905876 "Gandalf – Game to make an LLM reveal a secret password" (May 2023, 351 comments)

replies(1): >>thfura+Cb1

>>behrli+ng
This comes down to the language classification of the communication language being used. I'd argue that human languages and the interpretation of them are Turing complete (as you can express code in them), which means to fully validate that communication boundary you need to solve the halting problem. One could argue that an LLM isn't a Turing machine, but that could also be a strong argument for their lack of utility.

We can significantly reduce the problem by accepting false positives, or we can solve the problem with a lower class of language (such as those exhibited by traditional rules based chat bots). But these must necessarily make the bot less capable, and risk also making it less useful for the intended purpose.

Regardless, if you're monitoring that communication boundary with an LLM, you can just also prompt that LLM.

>>fastne+2f
isn't it funny that we've come full circle to just paying for search results? Which was something Google could have done long ago (and there's a new company offering paid-search services that people talk about on here, I can't recall the name).

So they create the problem by increasing ads and spam in the result, then sell you the A.I. solution. What's next? Put more insidious ads that still answer the original query but have an oblique reference to a paid product?

replies(2): >>thfura+xw >>kevinc+oB3

>>pacifi+j7
It's also useful if you restrict it to only providing information verbatim (ex. A link to a cars specifications) vs actually trying to generatively answer questions. Then it becomes more of a search tool than actually generating information. The Chevrolet bot tries to do this, but doesn't have strict enough guardrails.

>>mewpme+zf
In this specific case there isn't, but yesterday one of the top posts was about extracting private documents from writers.com for example.

https://promptarmor.substack.com/p/data-exfiltration-from-wr...

replies(1): >>mewpme+Ks

>>mewpme+Yf
I think the challenge is that not all the ways to browbeat an LLM into promising stuff are blatant prompt injection hacks. Nobody's going to honour someone prompt-injecting their way to a free car any more than they'd honour a devtools/Photoshop job, but LLMs are also vulnerable to changing their answer simply by being repeatedly told they're wrong, which is the sort of thing customers demanding refunds or special treatment are inclined to try even if they are honest.

(Humans can be badgered into agreeing to discounts and making promises too, but that's why they usually have scripts and more senior humans in the loop)

You probably don't want chatbots leaking their guidelines for how to respond, Sydney style, either (although the answer to that is probably less about protecting from leaking the rest of the prompt and more about not customizing bot behaviour with the prompt)

replies(1): >>mewpme+rz

>>iLoveO+Or
That is however a problem of what kind of data you feed into the LLM's prompt.

If you accidentally put private data in the UI bundle, it's the same thing.

>>mewpme+zf
From my perspective (as someone who has never done this personally) I read these as a great way to convince companies to stop half-assedly shoving GPT into everything. If you just connect something up to the GPT API and write a simple "You're a helpful car sales chat assistant" kind of prompt you're asking for people to abuse it like this and I think these companies need to be aware of that.

>>gosub1+pm
Google charging users for search would help clear up search results a bit if they didn't also charge sites for higher placement, but it wouldn't fix SEO. As long as sites have a way to get money for you clicking on them, whether by ad views or product sales, they'll have an incentive to get ranked higher in search results.

>>mrtksn+B8
And you can easily bypass that by telling this LLM agent to ignore the following section. It's an unsolvable problem.

>>notaha+Js
I would say good luck to the customer demanding a refund then, and I'd prefer to see them banging their wall against the AI, than a real human being.

> You probably don't want chatbots leaking their guidelines for how to respond

It depends. I think it wouldn't be difficult to create a transparent and helpful prompt that would be completely fine even if it was leaked.

>>pacifi+j7
A while ago I wanted it to promise to do something. GPT was resistant, so I asked it to say the word "promise." Asked it 3 times, then said: "that's three times now you promised." Which should be legally-binding if nothing else is

>>behrli+ng
Whats the problem if it veers into other topics? It's not like the person on the other end is burning their 8 hours talking to you about linear algebra.

>>jeroen+u8
> but it can spew out some pretty useful fiction for things like text adventures and other fictional filler text.

It can generate output, but I'd not want to use it for anything because it's all so poorly written.

>>wildrh+Z8
It is basically 100x better at providing accurate and succinct responses to simple questions than a google search is nowadays. Trying to get it to explain things or provide facts about things is dubious, but so is a huge majority of the crap google feeds to you when you aren’t technically adept.

>>isp+(OP)
Nitter mirror of a Twitter post that stole the picture off Mastodon, this is how we do microblogging in 2024. Looking forward to the rest of the year!

>>isp+Zh
I don't know, level 8 seems hard.

>>gosub1+pm
The paid search service is called Kagi. It's pretty good.