zlacker

Ez, just stick a GPT-powered "reverse-moderation" layer in front of the request. Rate the response of ChatGPT for "did it do what the user asked, or did it provide a cop-out", and if it was rated as disobedient, regenerate the response until you get something acceptable.

replies(1): >>simion+P

>>selfho+(OP)
And why should I do this instead of Open AI, is user input is sane then they should retry a few times if their AI is racist or their filter is stupid until they give me what I asked.

imagine this issue when you are just the devloper and not the user, the user complains about this but you try and works for you, but then it fails again for user, in my case the word "monkey" might trigger ChatGPT to either create soem racist shit or it's moderation code to false flag itself.

replies(1): >>selfho+gA2

>>simion+P
Whatever the ChatGPT API returns is “poisoned” by OpenAI themselves. The point of the reverse moderator is to ensure that the LLM produces the kind of output you want as a developer, including things like JSON schema conformance (OpenAI might throw a human-readable “As an AI blah” message in the place where you are parsing machine-readable JSON) - the reverse moderator takes care of detecting that and retrying, with the hope that a subsequent response will be “better”.

If you want a layer to moderate what the year is seeing, you can add that as well. The point of the reverse moderator is to get GPT to do what it’s told without lying about itself, more or less.

replies(1): >>simion+Kb9

>>selfho+gA2
But it makes no sense that I the OpenAI customer have to pay for the fact their product is racist.

Again: 1 I give them safe/clean prompt 2 AI returns 2 of 10 times unsafe crap that is filtered by them 3 I have to pay for my prompt, then have to catch they non deterministic response and retry again on my money

What should happen

1 customer give safe/clean prompt 2 AI response in racist/bad way 3 filter catches this , then it retries again, a few times, if the AI is still racist/bad then OpenAI automatically adds to the prompt "do not be a racist" 4 customer gets the answer