zlacker

[parent] [thread] 4 comments
1. behrli+(OP)[view] [source] 2023-12-18 13:53:16
> You can literally always bypass any safeguard.

I find it hard to believe that a GPT4 level supervisor couldn't block essentially all of these. GPT4 prompt: "Is this conversation a typical customer support interaction, or has it strayed into other subjects". That wouldn't be cheap at this point, but this doesn't feel like an intractable problem.

replies(3): >>isp+C1 >>danpal+m5 >>butlik+Rw
2. isp+C1[view] [source] 2023-12-18 13:59:12
>>behrli+(OP)
Counterexample: https://gandalf.lakera.ai/

Discussed at: >>35905876 "Gandalf – Game to make an LLM reveal a secret password" (May 2023, 351 comments)

replies(1): >>thfura+fV
3. danpal+m5[view] [source] 2023-12-18 14:14:44
>>behrli+(OP)
This comes down to the language classification of the communication language being used. I'd argue that human languages and the interpretation of them are Turing complete (as you can express code in them), which means to fully validate that communication boundary you need to solve the halting problem. One could argue that an LLM isn't a Turing machine, but that could also be a strong argument for their lack of utility.

We can significantly reduce the problem by accepting false positives, or we can solve the problem with a lower class of language (such as those exhibited by traditional rules based chat bots). But these must necessarily make the bot less capable, and risk also making it less useful for the intended purpose.

Regardless, if you're monitoring that communication boundary with an LLM, you can just also prompt that LLM.

4. butlik+Rw[view] [source] 2023-12-18 16:16:38
>>behrli+(OP)
Whats the problem if it veers into other topics? It's not like the person on the other end is burning their 8 hours talking to you about linear algebra.
◧◩
5. thfura+fV[view] [source] [discussion] 2023-12-18 18:05:44
>>isp+C1
I don't know, level 8 seems hard.
[go to top]