Perhaps they think using GPUs for computation is a passing fad? They hate money? Their product is actually terrible and they dont want to get found out (that one might be true for intel)?
[1]_https://www.reddit.com/r/Amd/comments/140uct5/geohot_giving_...
They haven’t had gfx card driver issues in years now and people still say “oh I don’t want AMD cos their drivers don’t work”.
Yes, much needed.
Here's a list of possible "monopoly breakers" I'm going to write about in another post - some of these are things people are using today, some are available but don't have much user adoption, some are technically available but very hard to purchase or rent/use, and some aren't yet available:
* Software: OpenAI's Triton (you might've noticed it mentioned in some of "TheBloke" model releases and as an option in the oobabooga text-generation-webui), Modular's Mojo (on top of MLIR), OctoML (from the creators of TVM), geohot's tiny corp, CUDA porting efforts, PyTorch as a way of reducing reliance on CUDA
* Hardware: TPUs, Amazon Inferentia, Cloud companies working on chips (Microsoft Project Athena, AWS Tranium, TPU v5), chip startups (Cerebras, Tenstorrent), AMD's MI300A and MI300X, Tesla Dojo and D1, Meta's MTIA, Habana Gaudi, LLM ASICs, [+ Moore Threads]
The A/H100 with infiniband are still the most common request for startups doing LLM training though.
The current angle I'm thinking about for the post would be to actually use them all. Take Llama 2, and see which software and hardware approaches we can get inference working on (would leave training to a follow-up post), write about how much of a hassle it is (to get access/to purchase/to rent, and to get running), and what the inference speed is like. That might be too ambitious though, I could see it taking a while. If any freelancers want to help me research and write this, email is in my profile. No points for companies that talk a big game but don't have a product that can actually be purchased/used, I think - they'd be relegated to a "things to watch for in future" section.
The H100s are actually very good for inference..
As for driver: https://www.tomshardware.com/news/adrenalin-23-7-2-marks-ret...
(to their credit AMD is also getting serious lately, they put out a listing for like 30 ROCm developers a few weeks after geohot's meltdown, and they were in the process of doing a Windows release (previously linux-only) of ROCm with support for consumer gaming GPUs at the time as well. The message seems to have finally been received, it's a perennial topic here and elsewhere and with the obvious shower of money happening, maybe management was finally receptive to the idea that they needed to step it up.)
I've enabled nearly all GFX9 and GFX10 GPUs as I have packaged the libraries for Debian. I haven't tested every library with every GPU, but my experience has been that they pretty much all work. I suspect that will also be true of GFX11 once we move rocm-hipamd to LLVM 16.
1 bad driver update is not indicative of anything. Nvidia has had bad driver updates but you’re not shutting all over them. And running Nvidias own drivers on linux is still a pain point.
(And don’t try claim I’m an AMD fanboy when I don’t even have any AMD stuff at the moment. It’s all Intel/Nvidia)