zlacker

[parent] [thread] 5 comments
1. mschue+(OP)[view] [source] 2023-03-05 06:37:27
> Sometimes, like with CUDA, they just have an early enough lead that they entrench.

The problem in case of CUDA isn't just that NVIDIA was there early, it's that AMD and Khronos still offer no viable alternative after more than a decade. I've switched to CUDA half a year ago after trying to avoid it for years due to being proprietary. Unfortunately I discovered that CUDA is absolutely amazing - It's easy to get started, developer friendly in that it "just works" (which is never the case for Khronos APIs and environments), and it's incredibly powerful, kind of like programming C++17 for 80 x 128 SIMD processors. I wish there was a platform independent alternative, but OpenCL, Sycl, ROCm aren't it.

replies(1): >>ribs+j6
2. ribs+j6[view] [source] 2023-03-05 08:04:09
>>mschue+(OP)
I keep hearing that ROCm is DOA, but there’s a lot of supercomputing labs that are heavily investing in it, with engineers who are quite in favor of it.
replies(4): >>mschue+x8 >>pixele+Na >>doikor+cd >>pjmlp+ie
◧◩
3. mschue+x8[view] [source] [discussion] 2023-03-05 08:40:22
>>ribs+j6
I hope it takes off, a platform independent alternative to CUDA would be great. But if they want it to be successfully outside of supercomputing labs, it needs to be as easy to use as CUDA. And I'd say being successfull outside of supercomputer labs is important for overall adoption and success. For me personally, it would also need fast runtime compilation so that you can modify and hot-reload ROCm programs at runtime.
◧◩
4. pixele+Na[view] [source] [discussion] 2023-03-05 09:15:34
>>ribs+j6
If you want to run compute on AMD GPU hardware on Linux, it does work - however it's not as portable as CUDA as you practically have to compile your code for every AMD GPU architecture, whereas with CUDA the nvidia drivers give you an abstraction layer (ish, it's really PTX which provides it, but...) which is forwards and backwards compatible, which makes it trivial to support new cards / generations of cards without recompiling anything.
◧◩
5. doikor+cd[view] [source] [discussion] 2023-03-05 09:54:16
>>ribs+j6
With supercomputers you write your code for that specific supercomputer. In such an environment ROCm works ok. Trying to make a piece of ROCm code work on different cards/setups is real pain (and not that easy with CUDA either if you want good performance)
◧◩
6. pjmlp+ie[view] [source] [discussion] 2023-03-05 10:11:33
>>ribs+j6
Some random HPC lab with weight to have a AMD team drop by isn't the same thing as average joe and jane developer.
[go to top]