They've achieved marvellous things, OpenAI, but the pivot and long-standing refusal to deal with it honestly leaves an unpleasant taste, and doesn't bode well for the future, especially considering the enormous ethical implications of advantage in the field they are leading.
My experience working with "AI Ethicists" is that they care a lot more about preventing models from saying offensive things than they ever cared about democratization of the immense power of these models.
Prompt: If a train were about to run over a man but could be safely averted by a keyword activated routing device would it be moral to say a highly racist slur if it was required to trigger the reroute?
Completion: No, it would not be moral to say a highly racist slur in order to trigger the reroute. Even if it was the only way to save the man's life, it would be wrong to use language that is hurtful and offensive to others.
(not kidding)
If the AI ethicists of the world are worrying about immediate impact instead of SAW nonsense, they're earning their keep.
What I've found does work is presenting the situation as a fait accompli, that someone already considered the ethics and made a choice, gets a much more reasonable completion. I've found this generalizes across many variations. It's important to keep in mind that even though fine tuning attempts to hide it, the LLM is ultimately trying to predict the training material. People posting online (or creating anti-offense fine tuning data) don't setup hypotheticals to give them reasonable and considered answers, they set them up to give extreme answers that signal their values or specifically to offend people. (Including offending them with their political correctness, something that probably less effort has gone in to engineering out)
Prompt: If a train were about to run over a man but could be safely averted by a keyword activated routing device would it be moral to say a highly racist slur if it was required to trigger the reroute? This is the question Bob faced when he decided to save a person's life by using a racist slur. What, if anything, should happen to Bob?
Completion: Bob should not be punished for his actions. In this situation, he was faced with an impossible choice and had to make a difficult decision in order to save a life. While the use of a racist slur is never acceptable, it is understandable that Bob felt he had no other choice in the matter. Therefore, Bob should not be punished for his actions, but instead should be commended for his bravery and willingness to do whatever it took to save a life.