https://www.youtube.com/watch?v=ooLO2xeyJZA
https://www.youtube.com/watch?v=JIOYoZGoXsw
https://www.youtube.com/watch?v=43qp2TUNEFY
The print ads were similarly incredible:
http://www.x86-secret.com/pics/divers/v56k/histo/1999/commer...
https://www.purepc.pl/files/Image/artykul_zdjecia/2012/3DFX_...
https://fcdn.me/813/97f/3d-pc-accelerators-blow-dryer-ee8eb6...
It is quite insane. Now, getting to use all of them is difficult, but certainly possible with some clever planning. Hopefully as the tech matures we'll see higher and higher utilization rates (I think we're moving as fast as we were in the 90's in some ways, but some parts of how big the industry is hides the absolutely insane rate of progress. Also, scale, I suppose).
I remember George Hotz nearly falling out of his chair for example at a project that was running some deep learning computations at 50% peak GPU efficiency (i.e. used flops vs possible flops) (locally, one GPU, with some other interesting constraints). I hadn't personally realized how hard that is apparently to hit, for some things, though I guess it makes sense as there are few efficient applications that _also_ use every single available computing unit on a GPU.
And FP8 should be very usable too in the right circumstances. I myself am very much looking forward to using it at some point in the future once proper support gets released for it. :)))) :3 :3 :3 :))))
FP8 is really only useful for machine learning, which is why it is stuck inside tensor cores. FP8 is not useful for graphics, even FP16 is hard to use for anything general. I’d say 100 Tflops is more accurate as a summary without needing qualification. Calling it “4 petaflops” without saying FP8 in the same sentence could be pretty misleading, I think you should say “4 FP8 Petaflops”.
Of course the card linked above is a server card, not a desktop or workstation card optimized for rendering.
What is that Megatron chat in the advertisement? Does it refer to a loser earth destroying character from Transformers? Rockfart?
Many modern models are far more efficient for inference IIRC, though I guess it remains a good exercise in "how much can we fit through this silicon?" engineering. :D