3dfx: So powerful it’s kind of ridiculous

>>BirAda+(OP)
I'm just here to post old 3dfx commercials:

The print ads were similarly incredible:

http://www.x86-secret.com/pics/divers/v56k/histo/1999/commer...

https://www.purepc.pl/files/Image/artykul_zdjecia/2012/3DFX_...

https://fcdn.me/813/97f/3d-pc-accelerators-blow-dryer-ee8eb6...

>>rl3+h9
100 billion operations per second, what are we at now, 100 trillion?

>>echees+lO
We are in the 4 Petaflops on a single card age currently, my friend: https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor...

It is quite insane. Now, getting to use all of them is difficult, but certainly possible with some clever planning. Hopefully as the tech matures we'll see higher and higher utilization rates (I think we're moving as fast as we were in the 90's in some ways, but some parts of how big the industry is hides the absolutely insane rate of progress. Also, scale, I suppose).

I remember George Hotz nearly falling out of his chair for example at a project that was running some deep learning computations at 50% peak GPU efficiency (i.e. used flops vs possible flops) (locally, one GPU, with some other interesting constraints). I hadn't personally realized how hard that is apparently to hit, for some things, though I guess it makes sense as there are few efficient applications that _also_ use every single available computing unit on a GPU.

And FP8 should be very usable too in the right circumstances. I myself am very much looking forward to using it at some point in the future once proper support gets released for it. :)))) :3 :3 :3 :))))

>>tysam_+0T
> We are in the 4 Petaflops on a single card age currently

FP8 is really only useful for machine learning, which is why it is stuck inside tensor cores. FP8 is not useful for graphics, even FP16 is hard to use for anything general. I’d say 100 Tflops is more accurate as a summary without needing qualification. Calling it “4 petaflops” without saying FP8 in the same sentence could be pretty misleading, I think you should say “4 FP8 Petaflops”.

>>dahart+oY
At 1080p yes, tensor cores are not used. But at 4k majority of the pixels are filled by tensor cores (DLSS), so these FP8 ops are used.

Of course the card linked above is a server card, not a desktop or workstation card optimized for rendering.

What is that Megatron chat in the advertisement? Does it refer to a loser earth destroying character from Transformers? Rockfart?

>>startu+H61
Megatron is a Large Language Model -- unfortunately it seems they really undertrained it for the parameter counts it had, so it was more a numbers game of "hey, look how big this model is!" when they first released it.

Many modern models are far more efficient for inference IIRC, though I guess it remains a good exercise in "how much can we fit through this silicon?" engineering. :D

zlacker