If you want a layer to moderate what the year is seeing, you can add that as well. The point of the reverse moderator is to get GPT to do what it’s told without lying about itself, more or less.
Again: 1 I give them safe/clean prompt 2 AI returns 2 of 10 times unsafe crap that is filtered by them 3 I have to pay for my prompt, then have to catch they non deterministic response and retry again on my money
What should happen
1 customer give safe/clean prompt 2 AI response in racist/bad way 3 filter catches this , then it retries again, a few times, if the AI is still racist/bad then OpenAI automatically adds to the prompt "do not be a racist" 4 customer gets the answer