zlacker

[parent] [thread] 0 comments
1. menaer+(OP)[view] [source] 2024-01-24 08:26:11
> I think the SIMD part has more to do with loop analysis than ILP.

If you know how to rewrite the algorithm in such a way so that it makes close-to-ideal utilization of CPU ports through your SIMD then it is practically impossible to beat it. And I haven't seen a compiler (GCC, clang) doing such a thing or at least not in the instances I had written. I've measured substantial improvements from such and similar utilization of CPU-level microarchitectural details. So perhaps I don't think it's the loop analysis only but I do think it's practically an impossible task for the compiler. Perhaps with the AI ...

[go to top]