"Building the DirectX shader compiler better than Microsoft?" (2024) >>39324800
E.g. llama.cpp already supports hipBLAS; is there an advantage to this ROCm CUDA-compatibility layer - ZLUDA on Radeon (and not yet Intel OneAPI) - instead or in addition? https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#hi... >>38588573
What can't WebGPU abstract away from CUDA unportability? >>38527552
What's nice about BLAS is that there are optimized implementations for CPUs (Intel MKL) as well as NVIDIA (cuBLAS) and AMD (hipBLAS), so while it's very much limited in what it can do, you can at least write portable code around it.
ROCm/hipDNN wraps CuDNN on Nvidia and MiOpen on AMD; but hasn't been updated in awhile: https://github.com/ROCm/hipDNN
>>37808036 : conda-forge has various BLAS implementations, including MKL-optimized BLAS, and compatible NumPy and SciPy builds.
BLAS: Basic Linear Algebra Sub programs: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprogra...
"Using CuPy on AMD GPU (experimental)" https://docs.cupy.dev/en/v13.0.0/install.html#using-cupy-on-... :
$ sudo apt install hipblas hipsparse rocsparse rocrand rocthrust rocsolver rocfft hipcub rocprim rcclYou were asking if this CUDA compatability layer might hold any advantage over HIP (e.g. for use by llama.cpp) ?
I think the answer is no, since HIP includes pretty full-featured support for many of the higher level CUDA-based APIs (cuDNN, cuBLAS, etc), while per the Phoronix article ZLUDA only (currently) has minimal support for them.
I wouldn't expect ZLUDA to provide any performance benefit over HIP either, since on AMD hardware HIP is just a pass-thru to MIOpen (AMD's equivalent to cuDNN), rocBLAS, etc.