https://gcc.godbolt.org/z/fGaP6roe9
I see the same behavior on clang 17 as well
I deal with a lot of floating point professionally day to day, and I use fast math all the time, since the tradeoff for higher performance and the relatively small loss of accuracy are acceptable. Maybe the biggest issue I run into is lack of denorms with CUDA fast-math, and it’s pretty rare for me to care about numbers smaller than 10^-38. Heck, I’d say I can tolerate 8 or 16 bits of mantissa most of the time, and fast-math floats are way more accurate than that. And we know a lot of neural network training these days can tolerate less than 8 bits of mantissa.
People who deal with actual numerical computing know that the statement "fast math is only slightly less accurate" is absurd. Fast math is unbounded in its inaccuracy! It can reorder your computations so that something that used to sum to 1 now sums to 0, it can cause catastrophic cancellation, etc.
Please stop giving people terrible advice on a topic you're totally unfamiliar with.
Yes, and it could very well be that the correct answer is actually 0 and not 1.
Unless you write your code to explicitly account for fp associativity effects, in which case you don't need generic forum advice about fast-math.