FP8 is really only useful for machine learning, which is why it is stuck inside tensor cores. FP8 is not useful for graphics, even FP16 is hard to use for anything general. I’d say 100 Tflops is more accurate as a summary without needing qualification. Calling it “4 petaflops” without saying FP8 in the same sentence could be pretty misleading, I think you should say “4 FP8 Petaflops”.
Of course the card linked above is a server card, not a desktop or workstation card optimized for rendering.
What is that Megatron chat in the advertisement? Does it refer to a loser earth destroying character from Transformers? Rockfart?
I guess Megatron is a language model framework https://developer.nvidia.com/blog/announcing-megatron-for-tr...
Though as the other commenter noted, NVIDIA does like getting their money's worth out of the tensor cores, and FP8 will likely be a large part of what they're doing with it. Crazy stuff. Especially since the temporal domain is so darn exploitable when covering for precision/noise issues -- they seem to be stretching things a lot further than I would have expected.
In any case -- crazy times.
Many modern models are far more efficient for inference IIRC, though I guess it remains a good exercise in "how much can we fit through this silicon?" engineering. :D