zlacker

[return to "Std: Clamp generates less efficient assembly than std:min(max,std:max(min,v))"]
1. tambre+2i[view] [source] 2024-01-16 13:21:16
>>x1f604+(OP)
Both recent GCC and Clang are able to generate the most optimal version for std::clamp() if you add something like -march=znver1, even at -O1 [0]. Interesting!

[0] https://godbolt.org/z/YsMMo7Kjz

◧◩
2. Grumpy+Wk[view] [source] 2024-01-16 13:41:52
>>tambre+2i
But then it uses AVX instructions. (You can replace -march=znver1 with just -mavx.)

When AVX isn’t enabled, the std::min + std::max example still uses fewer instructions. Looks like a random register allocation failure.

◧◩◪
3. x1f604+ze7[view] [source] 2024-01-18 09:13:41
>>Grumpy+Wk
I don't think it's a register allocation failure but is in fact necessitated by the ABI requirement (calling convention) for the first parameter to be in xmm0 and the return value to also be placed into xmm0.

So when you have an algorithm like clamp which requires v to be "preserved" throughout the computation you can't overwrite xmm0 with the first instruction, basically you need to "save" and "restore" it which means an extra instruction.

I'm not sure why this causes the extra assembly to be generated in the "realistic" code example though. See https://godbolt.org/z/hd44KjMMn

[go to top]