The point is you do not have a valid counterexample since you are using a different workflow than what's described in the article.
In my personal experience working with more complex prompts with more specific constraints/rules, adding the incentive in the system prompt has got it to behave much better. I am not cargo-culting: it's all qualitative in the end.