zlacker

The diffusion process is usually compute-bound, while transformer inference is memory-bound.

Apple Silicon is comparable in memory bandwidth to mid-range GPUs, but it’s light years behind on compute.

replies(1): >>tarrud+26

>>p-e-w+(OP)
> but it’s light years behind on compute.

Is that the only factor though? I wonder if pytorch is lacking optimization for the MPS backend.