I can select exactly where I want changes and have targeted element removal in Photoshop. If I submit the image and try to describe my desired changes textually, I get less easily-controllable output. (And I might still get scrambled text, for instance, in parts of the image that it didn't even need to touch.)
I think this sort of task-specific specialization will have a long future, hard to imagine pure-text once again being the dominant information transfer method for 90% of the things we do with computers after 40 years of building specialized non-text interfaces.