Std: Clamp generates less efficient assembly than std:min(max,std:max(min,v))

>>x1f604+(OP)
On gcc 13, the difference in assembly between the min(max()) version and std::clamp is eliminated when I add the -ffast-math flag. I suspect that the two implementations handle one of the arguments being NaN a bit differently.

https://gcc.godbolt.org/z/fGaP6roe9

I see the same behavior on clang 17 as well

https://gcc.godbolt.org/z/6jvnoxWhb

>>celega+im
You (celegans25) probably know this but here is a PSA that -ffast-math is really -finaccurate-math. The knowledgeable developer will know when to use it (almost never) while the naive user will have bugs.

>>gumby+1n
Why do you say almost never? Don’t let the name scare you; all floating point math is inaccurate. Fast math is only slightly less accurate, I think typically it’s a 1 or maybe 2 LSB difference. At least in CUDA it is, and I think many (most?) people & situations can tolerate 22 bits of mantissa compared to 23, and many (most?) people/situations aren’t paying attention to inf/nan/exception issues at all.

I deal with a lot of floating point professionally day to day, and I use fast math all the time, since the tradeoff for higher performance and the relatively small loss of accuracy are acceptable. Maybe the biggest issue I run into is lack of denorms with CUDA fast-math, and it’s pretty rare for me to care about numbers smaller than 10^-38. Heck, I’d say I can tolerate 8 or 16 bits of mantissa most of the time, and fast-math floats are way more accurate than that. And we know a lot of neural network training these days can tolerate less than 8 bits of mantissa.

>>dahart+Vy
> Fast math is only slightly less accurate

'slightly'? Last I checked, -Ofast completely breaks std::isnan and std::isinf--they always return false.

>>useful+EI
They are talking about -ffast-math, not -Ofast.

>>xigoi+7e1
From the gcc manual:

-Ofast

Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens. It turns off -fsemantic-interposition.

zlacker