But, the "I don't understand" is strong in this. it doesn't mean "it can't work" but I don't understand how it avoids the problems.
Maybe the size of the computed foveal coverage area is made big enough, to cover the movement? But if you move your eyes suddenly, there's got to be some lag while it computes the missing pixels. So you'd see the same as when Netflix ups the coding rate: crude render becomes clearer. Banded would become smooth transitions.
As for peripheral vision, any gradation being smooth probably helps, but there might be more tricks to make it look normal. I'm reminded of how jpeg images and some sound codecs only store information that we can actually perceive.