With supercomputers you write your code for that specific supercomputer. In such an environment ROCm works ok. Trying to make a piece of ROCm code work on different cards/setups is real pain (and not that easy with CUDA either if you want good performance)