I have not noticed any reduction in laziness with later generations, although I don't use ChatGPT in the same way that Aider does. I've had a lot of luck with using a chain-of-thought-style system prompt to get it to produce results. Here are a few cherry-picked conversations where I feel like it does a good job (including the system prompt). A common theme in the system prompts is that I say that this is an "expert-to-expert" conversation, which I found tends to make it include less generic explanatory content and be more willing to dive into the details.
- System prompt 1: https://sharegpt.com/c/osmngsQ
- System prompt 2: https://sharegpt.com/c/9jAIqHM
- System prompt 3: https://sharegpt.com/c/cTIqAil Note: I had to nudge ChatGPT on this one.
All of this is anecdotal, but perhaps this style of prompting would be useful to benchmark.