Apple Silicon is comparable in memory bandwidth to mid-range GPUs, but it’s light years behind on compute.
Is that the only factor though? I wonder if pytorch is lacking optimization for the MPS backend.