It is unethical to expose people to unsupervised LLM output who don't know that it's LLM output (or what an LLM is and its broad limitations). It would not be made any more ethical by conditioning the LLM to avoid offense, but it does make it more likely to go undetected.
To the extent that offensive output is a product of a greater fundamental problem, such as the fact that the model was trained on people's hyperbolic online performances rather than what they actually think and would respond, I'd consider it a good thing to resolve by addressing the fundamental problem. But addressing the symptom itself seems misguided and maybe a bit risky to me (because it removes the largely harmless and extremely obvious indicator without changing the underlying behavior).
Bad answers due to 'genre confusion' show up all the time, not just with offense hot buttons. It's why for example, bing and chatgpt so easily write dire dystopian science fiction when asked what they'd do if given free reign in the world.
This is the sort of question that would be valuable for a contemporary AI ethicist to pick apart, not the nonsense hypothetical.
Is it really any worse than any other form of bullshit (in the "truth-value is irrelevant to the speaker" sense)?