zlacker

[parent] [thread] 1 comments
1. p-e-w+(OP)[view] [source] 2025-12-06 23:57:34
The diffusion process is usually compute-bound, while transformer inference is memory-bound.

Apple Silicon is comparable in memory bandwidth to mid-range GPUs, but it’s light years behind on compute.

replies(1): >>tarrud+26
2. tarrud+26[view] [source] 2025-12-07 00:47:39
>>p-e-w+(OP)
> but it’s light years behind on compute.

Is that the only factor though? I wonder if pytorch is lacking optimization for the MPS backend.

[go to top]