zlacker

[return to "Std: Clamp generates less efficient assembly than std:min(max,std:max(min,v))"]
1. celega+im[view] [source] 2024-01-16 13:50:05
>>x1f604+(OP)
On gcc 13, the difference in assembly between the min(max()) version and std::clamp is eliminated when I add the -ffast-math flag. I suspect that the two implementations handle one of the arguments being NaN a bit differently.

https://gcc.godbolt.org/z/fGaP6roe9

I see the same behavior on clang 17 as well

https://gcc.godbolt.org/z/6jvnoxWhb

◧◩
2. gumby+1n[view] [source] 2024-01-16 13:54:31
>>celega+im
You (celegans25) probably know this but here is a PSA that -ffast-math is really -finaccurate-math. The knowledgeable developer will know when to use it (almost never) while the naive user will have bugs.
◧◩◪
3. cogman+Cp[view] [source] 2024-01-16 14:13:49
>>gumby+1n
Ehh, not so much inaccurate, more of a "floating point numbers are tricky, let's act like they aren't".

Compilers are pretty skittish about changing the order of floating point operations (for good reason) and ffast-math is the thing that lets them transform equations to try and generate faster code.

IE, instead of doing "n / 10" doing "n * 0.1". The issue, of course, being that things like 0.1 can't be perfectly represented with floats but 100 / 10 can be. So now you've introduced a tiny bit of error where it might not have existed.

◧◩◪◨
4. phkahl+Cr[view] [source] 2024-01-16 14:27:18
>>cogman+Cp
I've never understood why generating exceptions is preferable to just using higher precision.
◧◩◪◨⬒
5. dahart+HH[view] [source] 2024-01-16 15:42:58
>>phkahl+Cr
On a GPU, higher precision can cost between 2 and 64 times more than single precision, with typical ratios for consumer cards being 16 or 32. Even on the CPU, fp64 workloads tend to run at half the speed on real data due to the extra bandwidth needed for higher precision.
[go to top]