I’m pretty confident there will be situations where you can measure a statistically significant performance improvement by offering a tip or telling the model you have no hands, but I’m not convinced that it’s a universal best practice.
A big issue is that a lot of the advice you see around prompting is (imo) just the output of someone playing with GPT for a bit and noticing something cool. Without actual rigorous evals, these findings are basically just superstitions