zlacker

> Proton+DXVK for Linux gaming

"Building the DirectX shader compiler better than Microsoft?" (2024) >>39324800

E.g. llama.cpp already supports hipBLAS; is there an advantage to this ROCm CUDA-compatibility layer - ZLUDA on Radeon (and not yet Intel OneAPI) - instead or in addition? https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#hi... >>38588573

What can't WebGPU abstract away from CUDA unportability? >>38527552

replies(1): >>HarHar+CZ2

>>westur+(OP)
BLAS will only get you so far. About the highest level operation it has is matmul, which you can use to build convolution (im2col, matmul, col2im), but that won't be as performant as a hand optimized cuDNN convolution kernel. Same goes for any other high level neural net building blocks - trying to build them on top of BLAS will not get you remotely close to performance of a custom kernel.

What's nice about BLAS is that there are optimized implementations for CPUs (Intel MKL) as well as NVIDIA (cuBLAS) and AMD (hipBLAS), so while it's very much limited in what it can do, you can at least write portable code around it.

replies(1): >>westur+lI3

>>HarHar+CZ2
"CUDNN API supported by HIP" has a coverage table: https://rocm.docs.amd.com/projects/HIPIFY/en/amd-staging/tab...

ROCm/hipDNN wraps CuDNN on Nvidia and MiOpen on AMD; but hasn't been updated in awhile: https://github.com/ROCm/hipDNN

>>37808036 : conda-forge has various BLAS implementations, including MKL-optimized BLAS, and compatible NumPy and SciPy builds.

BLAS: Basic Linear Algebra Sub programs: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprogra...

"Using CuPy on AMD GPU (experimental)" https://docs.cupy.dev/en/v13.0.0/install.html#using-cupy-on-... :

  $ sudo apt install hipblas hipsparse rocsparse rocrand rocthrust rocsolver rocfft hipcub rocprim rccl

replies(1): >>HarHar+fY3

>>westur+lI3
I guess I misunderstood you.

You were asking if this CUDA compatability layer might hold any advantage over HIP (e.g. for use by llama.cpp) ?

I think the answer is no, since HIP includes pretty full-featured support for many of the higher level CUDA-based APIs (cuDNN, cuBLAS, etc), while per the Phoronix article ZLUDA only (currently) has minimal support for them.

I wouldn't expect ZLUDA to provide any performance benefit over HIP either, since on AMD hardware HIP is just a pass-thru to MIOpen (AMD's equivalent to cuDNN), rocBLAS, etc.