Nitter mirror: https://nitter.net/ChrisJBakke/status/1736533308849443121
Related - "New kind of resource consumption attack just dropped": https://twitter.com/loganb/status/1736449964006654329 | https://nitter.net/loganb/status/1736449964006654329
How do you plan on avoiding leaks or "side effects" like the tweet here?
If you just look for keywords in the output, I'll ask ChatGPT to encode its answers in base64.
You can literally always bypass any safeguard.
May be a case of moving goalposts, but I'm happy to bet that the speed of movement will slow down to a halt over time.
Would that be slower than having the human generate the responses? Perhaps.
That’s the conclusion I’ve drawn anyway. So it’s a good tool for the customer service team not a replacement for it
I'm personally using it because SEO bullshit has ruined search engines. AI can still sift through bullshit search results, for now. The key is assuming the AI lies and actually reading the page it links, because it'll make up facts and summaries even if they directly oppose the quoted source material.
I fear AI tools will soon befall the same faith as Google (where searching for an obscure term will land you a page of search results that's 75% malware and phishing links), but for now Bard and Bing Chat have their uses.
You could as well "Inspect Element" to change content on a website, then take a screenshot.
If you are intentionally trying to trick it, it doesn't matter if it is willing to give you a recipe.
The person in the end could also just inspect element to change the output, or photoshop the screenshot.
You should only care about it being as high quality as possible for honest customers. And against bad actors you must just be certain that it won't be easy to spam those requests because it can be expensive.
I find it hard to believe that a GPT4 level supervisor couldn't block essentially all of these. GPT4 prompt: "Is this conversation a typical customer support interaction, or has it strayed into other subjects". That wouldn't be cheap at this point, but this doesn't feel like an intractable problem.
Discussed at: >>35905876 "Gandalf – Game to make an LLM reveal a secret password" (May 2023, 351 comments)
We can significantly reduce the problem by accepting false positives, or we can solve the problem with a lower class of language (such as those exhibited by traditional rules based chat bots). But these must necessarily make the bot less capable, and risk also making it less useful for the intended purpose.
Regardless, if you're monitoring that communication boundary with an LLM, you can just also prompt that LLM.
So they create the problem by increasing ads and spam in the result, then sell you the A.I. solution. What's next? Put more insidious ads that still answer the original query but have an oblique reference to a paid product?
https://promptarmor.substack.com/p/data-exfiltration-from-wr...
(Humans can be badgered into agreeing to discounts and making promises too, but that's why they usually have scripts and more senior humans in the loop)
You probably don't want chatbots leaking their guidelines for how to respond, Sydney style, either (although the answer to that is probably less about protecting from leaking the rest of the prompt and more about not customizing bot behaviour with the prompt)
If you accidentally put private data in the UI bundle, it's the same thing.
> You probably don't want chatbots leaking their guidelines for how to respond
It depends. I think it wouldn't be difficult to create a transparent and helpful prompt that would be completely fine even if it was leaked.
It can generate output, but I'd not want to use it for anything because it's all so poorly written.