Excessive smoothing can be explained by compression, sure, but that's not the issue being raised there.
Neural compression wouldn't be like HVEC, operating on frames and pixels. Rather, these techniques can encode entire features and optical flow, which can explain the larger discrepancies. Larger fingers, slightly misplaced items, etc.
Neural compression techniques reshape the image itself.
If you've ever input an image into `gpt-image-1` and asked it to output it again, you'll notice that it's 95% similar, but entire features might move around or average out with the concept of what those items are.
Video compression operates on macroblocks and calculates motion vectors of those macroblocks between frames.
When you push it to the limit, the macroblocks can appear like they're swimming around on screen.
Some decoders attempt to smooth out the boundaries between macroblocks and restore sharpness.
The giveaway is that the entire video is extremely low quality. The compression ratio is extreme.
It looks like they're compressing the data before it gets further processed with the traditional suite of video codecs. They're relying on the traditional codecs to serve, but running some internal first pass to further compress the data they have to store.
I don't think that's actually what's up, but I don't think it's completely ruled out either.
https://blog.metaphysic.ai/what-is-neural-compression/
See this paper:
https://arxiv.org/abs/2412.11379
Look at figure 5 and beyond.
Here's one such Google paper: