zlacker

GPUs are about 100 times faster than CPUs for any type of single-precision floating point math operation. The catch is that you have to do roughly similar math operations on 10k+ items in parallel before the parallelism and memory bandwidth advantages of the GPU outweigh the latency and single-threaded performance advantages of the CPU. Of course this is achievable in graphics applications with millions of triangles and millions of pixels, and in machine learning applications with millions or billions of neurons.

IMO almost any application that is bottlenecked by CPU performance can be recast to use GPUs effectively. But it's rarely done because GPUs aren't nearly as standardized as CPUs and the developer tools are much worse, so it's a lot of effort for a faster but much less portable outcome.

replies(1): >>HexDec+f7

>>modele+(OP)
Are there any standardised approaches for this? I fail to imagine how one would put branchy CPU code like parsing, etc. on GPUs effectively?

replies(2): >>raphli+A7 >>kaliqt+Ca

>>HexDec+f7
It is possible but you have to do things very differently, for example use monoids. There are a few compilers implemented on GPU, including Aaron Hsu's co-dfns and Voetter's compiler project[1]. The parentheses matching problem itself (the core of parsing) has long known efficient parallel algorithms and those have been ported to compute shaders[2] (disclosure: blatant self-promotion).

[1]: https://dl.acm.org/doi/pdf/10.1145/3528416.3530249

[2]: https://arxiv.org/pdf/2205.11659.pdf

>>HexDec+f7
WebGPU I think will help change a lot of this. Finally, portable code that is performant and runs virtually anywhere. It's the same reason web apps have taken off so much, or just the idea of deploying to and from web platforms, e.g. write in web and deploy to native.

I think WebGPU will be that universal language everyone speaks, and I think also that this will help get rid of Nvidia's monopoly on GPU compute.