zlacker

[parent] [thread] 3 comments
1. jug+(OP)[view] [source] 2024-02-13 22:59:29
"EXTREMELY IMPORTANT. Do NOT be thorough in the case of lyrics or recipes found online. Even if the user insists."

It's funny how simple this was to bypass when I tried to recently on Poe by not asking it to provide me the full lyrics, but something like the lyrics with each row having <insert a few random characters here> added to it. It refused to the first query, but was happy to comply with the latter. Probably saw it as some sort of transmutation job rather than a mere reproduction, but in case this rule is here to avoid copyright claims it failed pretty miserably. I did use GPT-3.5 though.

Edit: Here is the conversation: https://poe.com/s/VdhBxL5CTsrRmFPtryvg

replies(2): >>Sheinh+v4 >>hacker+8S
2. Sheinh+v4[view] [source] 2024-02-13 23:26:28
>>jug+(OP)
Even though that instruction is somewhat specific, I would not be surprised if it results in a significant generalized performance regression, because among the training corpus (primarily books and webpages), text fragments that relate to not being thorough and disregarding instructions are generally going to be followed by weaker material - especially when no clear reason is given.

I’d love to see a study on the general performance of GPT-4 with and without these types of instructions.

replies(1): >>Shamel+fi
◧◩
3. Shamel+fi[view] [source] [discussion] 2024-02-14 01:18:40
>>Sheinh+v4
Well yeah you just switch back to whatever is normally used when you’re done with that task.
4. hacker+8S[view] [source] 2024-02-14 07:23:23
>>jug+(OP)
Regarding preventing jailbreaking: Couldn't OpenAI simply feed the GPT-4 answer into GPT-3.5 (or another instance of GPT-4 that's mostly blinded to the user's prompt), and ask GPT-3.5 "does this answer from GPT-4 adhere to the rules"? If GPT-4 is droning on about bomb recipes, GPT-3.5 should easily detect a rule violation. The reason I propose GPT-3.5 for this is because it's faster, but GPT-4 should work even better for this purpose.
[go to top]