zlacker

Both recent GCC and Clang are able to generate the most optimal version for std::clamp() if you add something like -march=znver1, even at -O1 [0]. Interesting!

[0] https://godbolt.org/z/YsMMo7Kjz

replies(2): >>Grumpy+U2 >>x1f604+HV6

>>tambre+(OP)
But then it uses AVX instructions. (You can replace -march=znver1 with just -mavx.)

When AVX isn’t enabled, the std::min + std::max example still uses fewer instructions. Looks like a random register allocation failure.

replies(2): >>gpdere+64 >>x1f604+xW6

>>Grumpy+U2
The additional "movapd xmm0, xmm2" is mostly free as it is handled by renaming, but yes, it seems a quirk of the register allocator. It wouldn't be the first time I see GCC trying to move stuff around without obvious reasons.

>>tambre+(OP)
Even with -march=znver1 at -O3 the compiler still generates fewer lines of assembly for the incorrect clamp compared to the correct clamp for this "realistic" code:

https://godbolt.org/z/WMKbeq5TY

>>Grumpy+U2
I don't think it's a register allocation failure but is in fact necessitated by the ABI requirement (calling convention) for the first parameter to be in xmm0 and the return value to also be placed into xmm0.

So when you have an algorithm like clamp which requires v to be "preserved" throughout the computation you can't overwrite xmm0 with the first instruction, basically you need to "save" and "restore" it which means an extra instruction.

I'm not sure why this causes the extra assembly to be generated in the "realistic" code example though. See https://godbolt.org/z/hd44KjMMn