but then stopped
People are criticizing AMD for dropping this, but it makes sense to stop paying for development when the dev has stopped doing the work, no?
And if he means that AMD stopped paying 3 years ago - well, that was before dinosaurs and ChatGPT, and alot has changed since then.
"Radeon Open Compute Platform"
https://github.com/ROCm/ROCm/issues/1628
And they wonder why they are losing. Branding absolutely matters.
They very much plan to compete in this space, and hope to ship $3.5B of these chips in the next year. Small compared to Nvidia's revenues of $59B (includes both consumer and data centre), but AMD hopes to match them. It's too big a market to ignore, and they have the hardware chops to match Nvidia. What they lack is software, and it's unclear if they'll ever figure that out.
762 changed files with 252,017 additions and 39,027 deletions.
https://github.com/vosen/ZLUDA/commit/1b9ba2b2333746c5e2b05a...Here it is: https://arstechnica.com/tech-policy/2021/04/how-the-supreme-...
AMD seems to be a firm believer in separating the consumer chips for gaming and the compute chips for everything else. This probably makes a lot of sense from a chip design and current business perspective, but I think it's shortsighted and a bad idea. GPUs are very competent compute devices, and basically wasting all that performance for "only" gaming is strange to me. AI and other compute is getting more and more important for things like image and video processing, language models, etc. Not only for regular consumers, but for enthusiasts and developers it makes a lot of sense to be able to use your 10 TFLOPS chip even when you're not gaming.
While reading through the AMD CDNA whitepaper I saw this and got a good chuckle. "culmination of years of effort by AMD" indeed.
> The computational resources offered by the AMD CDNA family are nothing short of astounding. However, the key to heterogeneous computing is a software stack and ecosystem that easily puts these abilities into the hands of software developers and customers. The AMD ROCm 4.0 software stack is the culmination of years of effort by AMD to provide an open, standards-based, low-friction ecosystem that enables productivity creating portable and efficient high-performance applications for both first- and third-party developers.
https://www.amd.com/content/dam/amd/en/documents/instinct-bu...
"Support" means that the card is actively tested and presumably has some sort of SLA-style push to fix bugs for. As their stack matures, a bunch of cards that don't have official support will work well [0]. I have an unsupported card. There are horrible bugs. But the evidence I've seen is that the card will work better with time even though it is never going to be officially supported. I don't think any of my hardware is officially supported by the manufacturer, but the kernel drivers still work fine.
> Meanwhile CUDA supports anything with Nvidia stamped on it before it's even released...
A lot of older Nvidia cards don't support CUDA v9 [1]. It isn't like everything supports everything, particularly in the early part of building out capability. The impression I'm getting is that in practice the gap in strategy here is not as large as the current state makes it seem.
[0] If anyone has bought an AMD card for their machine to multiply matrices they've been gambling on whether the capability is there. This comment is reasonable speculation, but I want to caveat the optimism by asserting that I'm not going to put money into AMD compute until there is some some actual evidence on the table that GPU lockups are rare.
https://github.com/vosen/ZLUDA/tree/v3?tab=readme-ov-file#fa...
The system package for HIP on Debian has been stuck on ROCm 5.2 / clang-15 for a while, but once I get it updated to ROCm 5.7 / clang-17, I expect that all discrete RDNA 3 GPUs will work.
It's annoying as hell to you and me that they are not catering to the market of people who want to run stuff on their gaming cards.
But it's not clear it's bad strategy to focus on executing in the high-end first. They have been very successful landing MI300s in the HPC space...
Edit: I just looked it up: 25% of the GPU Compute in the current Top500 Supercomputers is AMD
https://www.top500.org/statistics/list/
Even though the list has plenty of V100 and A100s which came out (much) earlier. Don't have the data at hand, but I wouldn't be surprised if AMD got more of the Top500 new installations than nVidia in the last two years.
https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguis...
So the contract is: as long as your future program does not touch any intrinsics etc that do not exist in CUDA 1.0, you can export the new program from CUDA 27.0 as PTX, and the GTX 6800 driver will read the PTX and let your gpu run it as CUDA 1.0 code… so it is quite literally just as they describe, unlimited forward and backward capability/support as long as you go through PTX in the middle.
https://docs.nvidia.com/cuda/archive/10.1/parallel-thread-ex...
so, same mistake intel made before.
However, that same logic doesn't apply to consumers, and since they continued to fail to learn that lesson now IBM doesn't even target the consumer market given that they never learned how to be competitive and could only ever effectively function when they had a monopoly or at least a vendor lock-in.
https://en.wikipedia.org/wiki/Acquisition_of_the_IBM_PC_busi...
Ahhhh, your hindsight is well developed. I would be interested to know the background on the reasons why Lotus made that bet. We can't know the counterfactual, but Lotus delivering on a platform owned by their deadly competitor Microsoft would seem to me to be a clearly worrysome idea to Lotus at the time. Turned out it was an existentially bad idea. Did Lotus fear Microsoft? "DOS ain't done till Lotus won't run" is a myth[1] for a reason. Edit: DRDOS errors[2] were one reason Lotus might fear Microsoft. We can just imagine a narritive of a different timeline where Lotus delivered on Windows but did some things differently to beat Excel. I agree, Lotus made other mistakes and Microsoft made some great decisions, but the point remains.
We can also suspect that AMD have a similar choice now where they are forked. Depending on Nvidea/CUDA may be a similar choice for AMD - fail if they do and fail if they don't.
[1] http://www.proudlyserving.com/archives/2005/08/dos_aint_done...
[2] https://www.theregister.com/1999/11/05/how_ms_played_the_inc...
I guess awhile ago it was found that Nvidia was bypassing the kernels GPL license driver check and I read that kernel 6.6 was going to lock that driver out if they didn't fix it, and from what I've read there was no reply or anything done by nvidia yet. Which I think I probably just can't find.
Am I wrong about that part?
We're on kernel 6.7.4 now and I'm still using the same drivers. Did it get pushed back, did nvidia fix it?
Also, while trying to find answers myself I came across this 21 year old post which is pretty funny and very apt for the topic https://linux-kernel.vger.kernel.narkive.com/eVHsVP1e/why-is...
I'm seeing conflicting info all over the place so I'm not really sure what the status of this GPL nvidia driver block thing is.
I tried to get it working this weekend but it was a huge PITA so I switched to putting everything into WSL2 then in arch on there pytorch etc in containers so I could flip versions easily now that I know how SPECIFIC the versions are to one another.
I'm still working on that part, halfway into it my WSL2 completely broke and I had to reinstall windows. I'm scared to mount the vhdx right now. I did ALL of my work and ALL of my documentation is inside of the WSL2 archlinux and NOT on my windows machine. I have EVERYTHING I need to quickly put another server up (dotfiles, configs) sitting in a chezmoi git repo ON THE VM. That I only git committed one init like 5 mins into everything. THAT was a learning experience, now I have no idea if I should follow the "best practice" of keeping projects in wsl or having wsl reach out to windows, there's a performance drop. The 9p networking stopped working and no matter what I reinstalled, reset, removed features, reset windows, etc, it wouldn't start. But at least I have that WSL2 .vhdx image that will hopefully mount and start. And probably break WSL2 again. I even SPECIFICALLY took backups of the image as tarballs every hour in case I broke LINUX, not WSL.
If anyone has done sd containers in wsl2 already let me know. I've tried to use WSL for dev work (i use osx) like this 2-3 times in the last 4-5 years and I always run into some catastrophically broken thing that makes my WSL stop working. I hadn't used it in years so hoped it was super reliable by now. This is on 3 different desktops with completely different hardware, etc. I was terrified it would break this weekend and IT DID. At least I can be up in windows in 20 minutes thanks to chocolately and chezmoi. Wiped out my entire gaming desktop.
Sorry I'm venting now this was my entire weekend.
This repo is from a deepspeed contrib (iirc) and lists the reqs for deepspeed + windows that mention the version matches
https://github.com/S95Sedan/Deepspeed-Windows
> conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
It may sound weird to do any of this in Windows, or maybe not, but if it does just remember that it's a lot of gamers like me with 4090s who just want to learn ML stuff as a hobby. I have absolutely no idea what I'm doing but thank god I know containers and linux like the back of my hand.
Again, you are missing the point. Java is both a language (java source) and a machine (the JVM). The latter is a hardware ISA - there are processors that implement Java bytecode as their ISA format. Yet most people who are running Java are not doing so on java-machine hardware, yet they are using the java ISA in the process.
https://en.wikipedia.org/wiki/Java_processor
https://en.wikipedia.org/wiki/Bytecode#Execution
any bytecode is an ISA, the bytecode spec defines the machine and you can physically build such a machine that executes bytecode directly. Or you can translate via an intermediate layer, like how Transmeta Crusoe processors executed x86 as bytecode on a VLIW processor (and how most modern x86 processors actually use RISC micro-ops inside).
these are completely fungible concepts. They are not quite the same thing but bytecode is clearly an ISA in itself. Any given processor can choose to use a particular bytecode as either an ISA or translate it to its native representation, and this includes both PTX, Java, and x86 (among all other bytecodes). And you can do the same for any other ISA (x86 as bytecode representation, etc).
furthermore, what most people think of as "ISAs" aren't necessarily so. For example RDNA2 is an ISA family - different processors have different capabilities (for example 5500XT has mesh shader support while 5700XT does not) and the APUs use a still different ISA internally etc. GFX1101 is not the same ISA as GFX1103 and so on. These are properly implementations not ISAs, or if you consider it to be an ISA then there is also a meta-ISA encompassing larger groups (which also applies to x86's numerous variations). But people casually throw it all into the "ISA" bucket and it leads to this imprecision.
like many things in computing, it's all a matter of perspective/position. where is the boundary between "CMT core within a 2-thread module that shares a front-end" and "SMT thread within a core with an ALU pinned to one particular thread"? It's a matter of perspective. Where is the boundary of "software" vs "hardware" when virtually every "software" implementation uses fixed-function accelerator units and every fixed-function accelerator unit is running a control program that defines a flow of execution and has schedulers/scoreboards multiplexing the execution unit across arbitrary data flows? It's a matter of perspective.
(To be clear, HIP is about converting CUDA source code not running CUDA-compiled binaries but the Zluda project discussed in OP heavily relies on it.)
The big issue for Intel is pretty similar to that of AMD; everything is made for CUDA, and Intel has to either build their own solutions or convince people to build support for Intel. While I'm working on learning AI and plan to use an Nvidia card, its pretty the progress Intel has made in the last couple of years since introducing their first GPU to market has been pretty wild, and I think it really give AMD pause.
... after a couple decades of legal proceedings and a looming FTC monopoly case convinced Intel to throw in the towel, cross-license, and compete more fairly with AMD.
https://jolt.law.harvard.edu/digest/intel-and-amd-settlement
AMD didn't just magically do it on its own.
I got this up and running on my windows machine in short order and I don't even know what stable diffusion is.
But again, it would be nice to have first class support to locally participate in the fun.
https://linuxmusicians.com/viewtopic.php?t=25556
Could be completely unrelated though, RDP sessions can definitely act up, get audio out of sync etc. I try to never do pass through rdp audio, it's not even enabled by default in the mstsc client IIRC but that may just be a "probably server" thing.
H100's are hard to get. Nearly impossible. CoreWeave and others have scooped them all up for the foreseeable future. So, if you are looking at only price as the factor, then it becomes somewhat irrelevant, if you can't even buy them [0]. I don't really understand the focus on price because of this fact.
Even if you do manage to score yourself some H100's. You also need to factor in the networking between nodes. IB (Infiniband) made by Mellanox, is owned by NVIDIA. Lead times on that equipment are 50+ weeks. Again, price becomes irrelevant if you can't even network your boxes together.
As someone building a business around MI300x (and future products), I don't care that much about price [!]. We know going in that this is a super capital intensive business and have secured the backing to support that. It is one of those things where "if you have to ask, you can't afford it."
We buy cards by the chassis, it is one price. I actually don't know the exact prices of the cards (but I can infer it). It is a lot about who you know and what you're doing. You buy more chassis, you get better pricing. Azure is probably paying half of what I'm paying [1]. But I'd also say that from what I've seen so far, their chassis aren't nearly as nice as mine. I have dual 9754's, 2x bonded 400G, 3TB ram, and 122TB nvme... plus the 8x MI300x. These are top of the top. They have Intel and I don't know what else inside.
[!] Before you harp on me, of course I care about price... but at the end of the day, it isn't what I'm focused on today as much as just being focused on investing all of the capex/opex that I can get my hands on, into building a sustainable business that provides as much value as possible to our customers.
[0] https://www.tomshardware.com/news/tsmc-shortage-of-nvidias-a...
[1] https://www.techradar.com/pro/instincts-are-massively-cheape...
"Building the DirectX shader compiler better than Microsoft?" (2024) >>39324800
E.g. llama.cpp already supports hipBLAS; is there an advantage to this ROCm CUDA-compatibility layer - ZLUDA on Radeon (and not yet Intel OneAPI) - instead or in addition? https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#hi... >>38588573
What can't WebGPU abstract away from CUDA unportability? >>38527552
"While AMD ships pre-built ROCm/HIP stacks for the major enterprise Linux distributions, if you are using not one of them or just want to be adventurous and compile your own stack for building HIP programs for running on AMD GPUs, one of the AMD Linux developers has written a how-to guide. "(1)
(1)
"Building An AMD HIP Stack From Upstream Open-Source Code
Written by Michael Larabel in Radeon on 9 February 2024 at 06:45 AM EST."
Side point, there's a driver in your linux kernel already that'll probably work. The driver that ships with rocm is a newer version of the same and might be worth building via dkms.
Very strange that the rocm github doesn't have build scripts but whatever, I've been trying to get people to publish those for almost five years now and it just doesn't seem to be feasible.
https://hpc.guix.info/blog/2024/01/hip-and-rocm-come-to-guix...
> AMD has just contributed 100+ Guix packages adding several versions of the whole HIP and ROCm stack
"AMD’s client segment, mostly chips for PCs and laptops, rose 62% year over year to $1.46 billion in sales, thanks to recent chip launches.
Sales in AMD’s gaming segment, which includes “semi-custom” processors for Microsoft Xbox and Sony PlayStation consoles, fell 17%. "
* https://www.cnbc.com/2024/01/30/amd-earnings-report-q4-2024....
https://www.phoronix.com/forums/forum/linux-graphics-x-org-d...
And I'm on Linux Mint 21.3 and so how to change any instillation script to think that Mint is Ubuntu to get that to maybe work there but there's no how-to for Mint like the one that AMD provides for Ubuntu! And really that's compiled By AMD for the specific Linux Kernel so not any DKMS sort of methods there AFAIK! but I'm no Linux Expert and just want some one-click install or that to ship with the Distro already working so Blender 3D's iGPU/dGPU accelerated Cycles rendering is possible on AMD Radeon consumer GPUs.
Pretty sure Vulkan gonna work equally well, at the very least there’s an open source DXVK project which implements D3D11 on top of Vulkan.
given how omnipresent she is with her live streaming, it's a bit like South Park's Worldwide Privacy Tour: https://www.youtube.com/watch?v=2N8_5LDkZwY
You can't install the PyTorch that's best for the currently running platform using a pyproject.toml with a setuptools backend, for starters. Invoke would have to author a setup.py that deals with all the issues, in a way that is compatible with build isolation.
> The majority of my struggle would have been solved by a recent working Docker image containing a working setup. (The one on Docker Hub is 9 months old.)
Why? Given the state of the ecosystem, what guarantee is there really that the documentation for Docker Desktop with AMD ROCm device binding is going to actually work for your device? (https://rocm.docs.amd.com/projects/MIVisionX/en/latest/docke...)
There is a lot of ad-hoc reinvention of tooling in this space.
I love the direct, "no bullshit" style of writing.
Some gems:
> Anyone familiar with C++ will instantly understand that compiling it is a complicated affair.
> Additionally CUDA allows, to a large degree, mixing CPU code and GPU code. What does all this complexity mean for ZLUDA? Absolutely nothing
> Since an application can dynamically link to either Driver API or Runtime API, it would seem that ZLUDA needs to provide both. In reality very few applications dynamically link to Runtime API. For the vast majority of applications it's sufficient to provide Driver API for dynamic (runtime) linking.
ROCm/hipDNN wraps CuDNN on Nvidia and MiOpen on AMD; but hasn't been updated in awhile: https://github.com/ROCm/hipDNN
>>37808036 : conda-forge has various BLAS implementations, including MKL-optimized BLAS, and compatible NumPy and SciPy builds.
BLAS: Basic Linear Algebra Sub programs: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprogra...
"Using CuPy on AMD GPU (experimental)" https://docs.cupy.dev/en/v13.0.0/install.html#using-cupy-on-... :
$ sudo apt install hipblas hipsparse rocsparse rocrand rocthrust rocsolver rocfft hipcub rocprim rcclROCm docs > "Install ROCm Docker containers" > Base Image: https://rocm.docs.amd.com/projects/install-on-linux/en/lates... links to ROCm/ROCm-docker: https://github.com/ROCm/ROCm-docker which is the source of docker.io/rocm/rocm-terminal: https://hub.docker.com/r/rocm/rocm-terminal :
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/rocm-terminal
ROCm docs > "Docker image support matrix":
https://rocm.docs.amd.com/projects/install-on-linux/en/lates...ROCm/ROCm-docker//dev/Dockerfile-centos-7-complete: https://github.com/ROCm/ROCm-docker/blob/master/dev/Dockerfi...
Bazzite is a ublue (Universal Blue) fork of the Fedora Kinoite (KDE) or Fedora Silverblue (Gnome) rpm-ostree Linux distributions; ublue-os/bazzite//Containerfile : https://github.com/ublue-os/bazzite/blob/main/Containerfile#... has, in addition to fan and power controls, automatic updates on desktop, supergfxctl, system76-scheduler, and an fsync kernel:
rpm-ostree install rocm-hip \
rocm-opencl \
rocm-clinfo
But it's not `rpm-ostree install --apply-live` because its a Containerfile.To install a ublue-os distro, you install any of the Fedora ostree distros: {Silverblue, Kinoite, Sway Atomic, or Budgie Atomic} from e.g. a USB stick and then `rpm-ostree rebase <OCI_host_image_url>`:
rpm-ostree rebase ostree-unverified-registry:ghcr.io/ublue-os/bazzite:stable
rpm-ostree rebase ostree-unverified-registry:ghcr.io/ublue-os/bazzite-nvidia:stable
rpm-ostree rebase ostree-image-signed:
ublue-os/config//build/ublue-os-just/40-nvidia.just defines the `ujust configure-nvidia` and `ujust toggle-nvk` commands:
https://github.com/ublue-os/config/blob/main/build/ublue-os-...There's a default `distrobox` with pytorch in ublue-os/config//build/ublue-os-just/etc-distrobox/apps.ini: https://github.com/ublue-os/config/blob/main/build/ublue-os-...
[mlbox]
image=nvcr.io/nvidia/pytorch:23.08-py3
additional_packages="nano git htop"
init_hooks="pip3 install huggingface_hub tokenizers transformers accelerate datasets wandb peft bitsandbytes fastcore fastprogress watermark torchmetrics deepspeed"
pre-init-hooks="/init_script.sh"
nvidia=true
pull=true
root=false
replace=false
docker.io/rocm/pytorch:
https://hub.docker.com/r/rocm/pytorchpytorch/builder//manywheel/Dockerfile: https://github.com/pytorch/builder/blob/main/manywheel/Docke...
ROCm/pytorch//Dockerfile: https://github.com/ROCm/pytorch/blob/main/Dockerfile
The ublue-os (and so also bazzite) OCI host image Containerfile has Sunshine installed; which is a 4k HDR 120fps remote desktop solution for gaming.
There's a `ujust remove-sunshine` command in system_files/desktop/shared/usr/share/ublue-os/just/80-bazzite.just : https://github.com/ublue-os/bazzite/blob/main/system_files/d... and also kernel args for AMD:
pstate-force-enable:
rpm-ostree kargs --append-if-missing=amd_pstate=active
ublue-os/config//Containerfile:
https://github.com/ublue-os/config/blob/main/ContainerfileLizardByte/Sunshine: https://github.com/LizardByte/Sunshine
moonlight-stream https://github.com/moonlight-stream
Anyways, hopefully this PR fixes the immediate issue: https://github.com/invoke-ai/InvokeAI/pull/5714/files
conda-forge/pytorch-cpu-feedstock > "Add ROCm variant?": https://github.com/conda-forge/pytorch-cpu-feedstock/issues/...
And Fedora supports OCI containers as host images and also podman container images with just systemd to respawn one or a pod of containers.
I'm not sure what you're pointing to with your reference to the Fedora-based images. I'm quite happy with my NixOS install and really don't want to switch to anything else. And as long as I have the correct kernel module, my host OS really shouldn't matter to run any of the images.
And I'm sure it can be made to work with many base images, my point was just that the dependency management around pytorch was in a bad state, where it is extremely easy to break.
> Anyways, hopefully this PR fixes the immediate issue: https://github.com/invoke-ai/InvokeAI/pull/5714/files
It does! At least for me. It is my PR after all ;)
AMD fundamentally viewed/views GPUs as nothing more than a tool to make semicustom deals. Just like "xbox isn't the product, gamepass is the product" - well, for AMD "radeon isn't the product, semicustom is the product". The only thing they really need graphics for is APUs, and they don't need to beat the 4090, they just need to beat Xe-LP. They don't need raytracing, they don't need that "AI" crap (oops), just to run games at 720p/1080p.
They're happy to squeeze whatever they can out of Sony/MS's R&D spend, but they aren't going to invest heavily on their own. And now that there is an obvious money fountain occurring in AI/ML... that is starting to change.
It was always about the money, specifically the lack of it. AMD knew HSA-Library/OpenCL/etc sucked, they didn't care, especially when the money was better spent going after Intel instead of NVIDIA. Intel is dysfunctional and AMD had a chance to crack their marketshare, and that's where every penny they had went. And that's probably not a wrong business decision.