zlacker

[return to "Std: Clamp generates less efficient assembly than std:min(max,std:max(min,v))"]
1. celega+im[view] [source] 2024-01-16 13:50:05
>>x1f604+(OP)
On gcc 13, the difference in assembly between the min(max()) version and std::clamp is eliminated when I add the -ffast-math flag. I suspect that the two implementations handle one of the arguments being NaN a bit differently.

https://gcc.godbolt.org/z/fGaP6roe9

I see the same behavior on clang 17 as well

https://gcc.godbolt.org/z/6jvnoxWhb

◧◩
2. gumby+1n[view] [source] 2024-01-16 13:54:31
>>celega+im
You (celegans25) probably know this but here is a PSA that -ffast-math is really -finaccurate-math. The knowledgeable developer will know when to use it (almost never) while the naive user will have bugs.
◧◩◪
3. dahart+Vy[view] [source] 2024-01-16 15:05:15
>>gumby+1n
Why do you say almost never? Don’t let the name scare you; all floating point math is inaccurate. Fast math is only slightly less accurate, I think typically it’s a 1 or maybe 2 LSB difference. At least in CUDA it is, and I think many (most?) people & situations can tolerate 22 bits of mantissa compared to 23, and many (most?) people/situations aren’t paying attention to inf/nan/exception issues at all.

I deal with a lot of floating point professionally day to day, and I use fast math all the time, since the tradeoff for higher performance and the relatively small loss of accuracy are acceptable. Maybe the biggest issue I run into is lack of denorms with CUDA fast-math, and it’s pretty rare for me to care about numbers smaller than 10^-38. Heck, I’d say I can tolerate 8 or 16 bits of mantissa most of the time, and fast-math floats are way more accurate than that. And we know a lot of neural network training these days can tolerate less than 8 bits of mantissa.

◧◩◪◨
4. mort96+ZA[view] [source] 2024-01-16 15:14:44
>>dahart+Vy
The scary thing IMO is: your code might be fine with unsafe math optimisations, but maybe you're using a library which is written to do operations in a certain order to minimise numerical error, and unsafe math operations changes the code which are mathematically equivalent but which results in many orders of magnitude more numerical error. It's probably fine most of the time, but it's kinda scary.
◧◩◪◨⬒
5. dahart+oC[view] [source] 2024-01-16 15:22:16
>>mort96+ZA
It shouldn’t be scary. Any library that is sensitive to order of operations will hopefully have a big fat warning on it. And it can be compiled separately with fast-math disabled. I don’t know of any such libraries off the top of my head, and it’s quite rare to find situations that result in orders of magnitude more error, though I grant you it can happen, and it can be contrived pretty easily.
◧◩◪◨⬒⬓
6. planed+TC[view] [source] 2024-01-16 15:24:50
>>dahart+oC
You can't fully disable fast-math per-library, moreover a library compiled with fast-math might also introduce inaccuracies in a seemingly unrelated library or application code in the same executable. The reason is that fast-math enables some dynamic initialization of the library that changes the floating point environment in some ways.
◧◩◪◨⬒⬓⬔
7. dahart+GI[view] [source] 2024-01-16 15:46:14
>>planed+TC
> You can’t fully disable fast-math per library

Can you elaborate? What fast-math can sneak into a library that disabled fast-math at compile time?

> fast-math enables some dynamic initialization of the library that changes the floating point environment in some ways.

I wasn’t aware of this, I would love to see some documentation discussing exactly what happens, can you send a link?

◧◩◪◨⬒⬓⬔⧯
8. mort96+9K[view] [source] 2024-01-16 15:53:07
>>dahart+GI
> Can you elaborate? What fast-math can sneak into a library that disabled fast-math at compile time?

A lot of library code is in headers (especially in C++!). The code in headers is compiled by your compiler using your compile options.

◧◩◪◨⬒⬓⬔⧯▣
9. dahart+LK[view] [source] 2024-01-16 15:55:06
>>mort96+9K
Ah, of course, very good point. A header-only library doesn’t have separate compile options. This is a great reason for a float-sensitive library to not be header-only, right?
◧◩◪◨⬒⬓⬔⧯▣▦
10. mort96+qM[view] [source] 2024-01-16 16:02:17
>>dahart+LK
It's not just about being header-only, lots of libraries which aren't header-only still have code in headers. The library may choose to put certain functions in headers for performance reasons (to let compiler inline them), or, in C++, function templates and class templates generally have to be in headers.

But yeah, it's probably a good idea to not put code which breaks under -ffast-math in headers if possible.

[go to top]