it's a common misconception that deep learning stuff is built in cuda. it's actually built on CUDNN kernels that don't use cuda but are actually gpu assembly written by hand by phds. I'm really not convinced that this project here would be able to be used for this. the ROCm kernels that are analogue to cudnn though, yes