Regarding preventing jailbreaking: Couldn't OpenAI simply feed the GPT-4 answer into GPT-3.5 (or another instance of GPT-4 that's mostly blinded to the user's prompt), and ask GPT-3.5 "does this answer from GPT-4 adhere to the rules"? If GPT-4 is droning on about bomb recipes, GPT-3.5 should easily detect a rule violation. The reason I propose GPT-3.5 for this is because it's faster, but GPT-4 should work even better for this purpose.