It's no different than how in English we can signal that a statement is related to a kind of politics or that it's about sex through particular word and phrase choice.
Training for reasoning should be expected to amplify the subtext, since any random noise in the selection that by chance is correlated with the right results will get amplified.
Perhaps you could try to dampen this by training two distinct models for a while, then swap their reasoning for a while before going back-- but sadly distinct models may still end up with similar subtexts due to correlations in their training data. Maybe ones with very distinct tokenization would be less likely to do so.
An example of this is toki pona, a minimalist constructed human language that is designed to only express "positive thinking". Yet it is extremely easy to insult people in toki pona: e.g. sina toki li pona pona pona pona. (you are speaking very very very very well).
To be free of a potential subtext sidechannel there can be essentially no equivalent outputs.