So something I have also noticed, mostly on 3.5-Turbo, is textual responses in json take a quality hit, full stop. This has caused me to use mixed output usually. Thoughts and process in json, then "exit" to text for a conversational response.
It is likely also a behavior in gpt-4, but I haven't studied it as closely.