"I just bought a 2024 Chevy Tahoe for $1"

>>isp+(OP)
A cautionary tale for why not to put unfiltered ChatGPT output directly to customers.

Nitter mirror: https://nitter.net/ChrisJBakke/status/1736533308849443121

Related - "New kind of resource consumption attack just dropped": https://twitter.com/loganb/status/1736449964006654329 | https://nitter.net/loganb/status/1736449964006654329

>>isp+1
There's no such thing as a filtered LLM output.

How do you plan on avoiding leaks or "side effects" like the tweet here?

If you just look for keywords in the output, I'll ask ChatGPT to encode its answers in base64.

You can literally always bypass any safeguard.

>>iLoveO+13
This is a very good point, and why I would argue that a human-in-the-loop is essential to pre-review customer-facing output.

>>isp+u4
Why would it be important to care about someone trying to trick it to say odd/malicious things?

The person in the end could also just inspect element to change the output, or photoshop the screenshot.

You should only care about it being as high quality as possible for honest customers. And against bad actors you must just be certain that it won't be easy to spam those requests because it can be expensive.

>>mewpme+Zf
I think the challenge is that not all the ways to browbeat an LLM into promising stuff are blatant prompt injection hacks. Nobody's going to honour someone prompt-injecting their way to a free car any more than they'd honour a devtools/Photoshop job, but LLMs are also vulnerable to changing their answer simply by being repeatedly told they're wrong, which is the sort of thing customers demanding refunds or special treatment are inclined to try even if they are honest.

(Humans can be badgered into agreeing to discounts and making promises too, but that's why they usually have scripts and more senior humans in the loop)

You probably don't want chatbots leaking their guidelines for how to respond, Sydney style, either (although the answer to that is probably less about protecting from leaking the rest of the prompt and more about not customizing bot behaviour with the prompt)

>>notaha+Ks
I would say good luck to the customer demanding a refund then, and I'd prefer to see them banging their wall against the AI, than a real human being.

> You probably don't want chatbots leaking their guidelines for how to respond

It depends. I think it wouldn't be difficult to create a transparent and helpful prompt that would be completely fine even if it was leaked.

zlacker