zlacker

> We are in the 4 Petaflops on a single card age currently

FP8 is really only useful for machine learning, which is why it is stuck inside tensor cores. FP8 is not useful for graphics, even FP16 is hard to use for anything general. I’d say 100 Tflops is more accurate as a summary without needing qualification. Calling it “4 petaflops” without saying FP8 in the same sentence could be pretty misleading, I think you should say “4 FP8 Petaflops”.

replies(2): >>startu+j8 >>tysam_+bb1

>>dahart+(OP)
At 1080p yes, tensor cores are not used. But at 4k majority of the pixels are filled by tensor cores (DLSS), so these FP8 ops are used.

Of course the card linked above is a server card, not a desktop or workstation card optimized for rendering.

What is that Megatron chat in the advertisement? Does it refer to a loser earth destroying character from Transformers? Rockfart?

replies(2): >>dahart+Pe >>tysam_+xb1

>>startu+j8
Oh yeah excellent point, I should not draw lines between graphics and ML — graphics has will continue to see more and more ML applications. I hope none of my coworkers see this.

I guess Megatron is a language model framework https://developer.nvidia.com/blog/announcing-megatron-for-tr...

>>dahart+(OP)
I did mention it, at the end! That's why I made the qualification, it is an important difference.

Though as the other commenter noted, NVIDIA does like getting their money's worth out of the tensor cores, and FP8 will likely be a large part of what they're doing with it. Crazy stuff. Especially since the temporal domain is so darn exploitable when covering for precision/noise issues -- they seem to be stretching things a lot further than I would have expected.

In any case -- crazy times.

>>startu+j8
Megatron is a Large Language Model -- unfortunately it seems they really undertrained it for the parameter counts it had, so it was more a numbers game of "hey, look how big this model is!" when they first released it.

Many modern models are far more efficient for inference IIRC, though I guess it remains a good exercise in "how much can we fit through this silicon?" engineering. :D