Ah, there’s your issue. There’s not a developer in human history who hasn’t drastically underestimated how long it would take to complete a task.
The conclusion isn't that "estimates are hard" (they can be), but rather that AI-assistance can lead people to believe they're being more productive than they actually are, because they incorrectly think they've spent less time.
The graphs in the paper tell part of that story; the time that is being reduced is in actual programming time, "Reading & Searching", "Testing & Debugging", but that time is being spent elsewhere, notably in parts specific to LLMs (reviewing output, prompting, waiting for the AI to spit out results).