I'm curious about this. Sure some CUDA code has already been written. If something new comes along that provides better performance per dollar spent, why continue writing CUDA for new projects? I don't think the argument that "this is what we know how to write" works in this case. These aren't scripts you want someone to knock out quickly.
They won’t be able to do that, their hardware isn’t fast enough.
Nvidia is beating them at hardware performance, AND ALSO has an exclusive SDK (CUDA) that is used by almost all deep learning projects. If AMD can get their cards to run CUDA via ROCm, then they can begin to compete with Nvidia on price (though not performance). Then, and only then, if they can start actually producing cards with equivalent performance (also a big stretch) they can try for an Embrace Extend Extinguish play against CUDA.
CUDA currently has the better raw performance, better availability, and a long record indicating that the platform won't just disappear in a couple of years. You can use it on pretty much any NVIDIA GPU and it's properly supported. The same CUDA code that ran on a GTX680 can run on an RTX4090 with minimal changes if any (maybe even the same binary).
In comparison, AMD has a very spotty record with their compute technologies, stuff gets released and becomes effectively abandonware, or after just a few years support gets dropped regardless of the hardware's popularity. For several generations they basically led people on with promises of full support on consumer hardware that either never arrived or arrived when the next generation of cards were already available, and despite the general popularity of the rx580 and the popularity of the Radeon VII in compute applications, they dropped 'official' support. AMD treats its 'consumer' cards as third class citizens for compute support, but you aren't going to convince people to seriously look into your platform like that. Plus, it's a lot more appealing to have "GPU acceleration will allow us to take advantage of newer supercomputers, while also offering massive benefits to regular users" than just the former.
This was ultimately what removed AMD as a consideration for us when we were deciding on which to focus on for GPU acceleration in our application. Many of us already had access to an NVIDIA GPU of any sort, which would make development easier, while the entire facility had one ROCm capable AMD GPU at the time, specifically so they could occasionally check in on its status.
Well, then I guess CUDA is not really the problem, so being able to run CUDA on AMD hardware wouldn't solve anything.
> try for an Embrace Extend Extinguish play against CUDA
They wouldn't need to go that route. They just need a way to run existing CUDA code on AMD hardware. Once that happens, their customers have the option to save money by writing ROCm or whatever AMD is working on at that time.
It is. All the things are the problem. AMD is behind on both hardware and software, for both gaming and compute workloads, and has been for many years. Their competitor has them beat in pretty much every vertical, and the lock-in from CUDA helps ensure that even if AMD can get their act together on the hardware side, existing compute workloads (there are oceans of existing workloads) won’t run on their hardware, so it won’t matter for professional or datacenter usage.
To compete with Nvidia in those verticals, AMD has to fix all of it. Ideally they’d come out with something better than CUDA, but they have not shown an aptitude for being able to do something like that. That’s why people keep telling them to just make a compatibility layer. It’s a sad place to be, but that’s the sad place where AMD is, and they have to play the hand they’ve been dealt.
It limits Nvidia's profit margin - if Nvidia cards run twice as fast but cost more than twice as much, then people will just buy two AMD cards. Meanwhile, it gives AMD some revenue with which to fund an improved CUDA stack.
>their customers have the option to save money by writing ROCm
CUDA saves money by having a fuckton of pre-written CUDA code and being supported as default basically everywhere.