And on GPU side, the existing libraries provide DSL based JITs, thus for many scenarios the performance is not much different from C++.
Now NVidia is also on the game with the new tile based architecture, with first party support to write kernels in Python even.