It's no different than how in English we can signal that a statement is related to a kind of politics or that it's about sex through particular word and phrase choice.
Training for reasoning should be expected to amplify the subtext, since any random noise in the selection that by chance is correlated with the right results will get amplified.
Perhaps you could try to dampen this by training two distinct models for a while, then swap their reasoning for a while before going back-- but sadly distinct models may still end up with similar subtexts due to correlations in their training data. Maybe ones with very distinct tokenization would be less likely to do so.