zlacker

It was just last week I was reading a comment that made it seem like you shouldn't really use -ffast-math[0], but here looks like a non-rare reason why you would want to enable it.

What is correct idiom here? It feels if this sort of thing really matters to you, you should have the know how to handroll a couple lines of ASM. I want to say this is rare, but I had a project a couple years ago where I needed to handroll some vectorized instructions on a raspberry pi.

[0] >>39013277

replies(7): >>stevek+z1 >>camel-+13 >>sevagh+c5 >>the847+xa >>jkafja+Yd >>miki12+Zd >>orlp+ih

>>nemoth+(OP)
The right path forward for Rust here in my opinion is to do the same thing as is done for math operations like saturating: stabilize a function or method that performs the operation with this semantic, and then build thin wrappers on top to make using them more convenient.

>>nemoth+(OP)
Imo the best solution would be special "fast" floating-point types, that have less strict requirements.

I personal almost always use -ffast-math by default in my C programs that care about performance, because I almost never care enough about the loss in accuracy. The only case I remember needing it was when doing some random number distribution tests where I cared about subnormals, and got confused for a second because they didn't seem to exist (-ffast-math disables them on x86).

replies(1): >>lifthr+KB

>>nemoth+(OP)
Definitely don't take an HN comment as a serious suggestion. Enable fast-math for your code, run your objective evaluation that's suitable for your domain, and if it passes the test, enjoy the added speed.

FWIW I have oodles of numerical C++ code where fast-math doesn't change the output.

replies(2): >>lifthr+pA >>Dylan1+pP

>>nemoth+(OP)
Usually one wants a subset of the thing that -ffast-math does, e.g. -fassociative-math. And only within some limited scope.

replies(1): >>camel-+1c

>>the847+xa
I played around with the example a bit, the minimum for vectorization seems to be -fassociative-math -fno-signed-zeros. The gcc docs say -fassociative-math requries -fno-signed-zeros and -fno-trapping-math though.

I suppose -fassociative-math -fno-signed-zeros -fno-trapping-math -freciprocal-math will get you most of the way there, and maybe an -ffinite-math-only when appropriate.

>>nemoth+(OP)
The usual technique is to keep a 4-element array of sums (so sum[j] is the sum of all terms of the form a[4*i + j] * b[4*i + j]), and then take the total at the very end. This allows for the use of vectorization even with strict IEEE-compliance.

Generally, I would recommend against -ffast-math mostly because it enables -ffinite-math-only and that one can really blow up in your face. Most other flags (like -funsafe-math-operations) aren't that bad from an accuracy standpoint. Obviously you should not turn them on for code that you have actually tuned to minimize error, but in other cases they barely ever degrade the results.

>>nemoth+(OP)
> you should have the know how to handroll a couple lines of ASM

For what architecture? What if this code is in a library that your users might want to run on Intel (both 32 and 64 bit), ARM, Risc V and s390x? Even if you learn assembly for all of these, how are you going to get access to an S390X IBM mainframe to test your code? What if a new architecture[1] gets popular in the next couple of years, and you won't have access to a CPU to test on?

Leaving this work to a compiler or architecture-independent functions / macros that use intrinsics under the hood frees you from having to think about all of that. As long as whatever the user is running on has decent compiler support, your code is going to work and be fast, even years later.

[1] https://en.wikipedia.org/wiki/Loongson

>>nemoth+(OP)
You can write it like this to get the compiler to generate SIMD: https://godbolt.org/z/ohvoEb7er

It's certainly not perfect though (in particular the final reduction/remainder handling).

Unfortunately Rust doesn't have a proper optimizing float type. I really wish there was a type FastF32 or something similar which may be optimized using the usual transformation rules of algebra (e.g. associative property, distributive property, x + y - y = x, etc).

There is fadd_fast and co, but those are UB on NaN/infinite input.

>>sevagh+c5
For a very long time `-funsafe-math-optimizations` (and thus `-ffast-math`) had been infectious [1], so a responsible library should never have used `-ffast-math` anyway.

You are right in that the final binary is free to turn `-ffast-math` on if you can verify that everything went okay. But almost no one would actually verify that. It's like an advice that you shouldn't write your own crypto code---it's fine if you know what you are doing, but almost no one does, so the advice is technically false but still worthwhile.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 (GCC), https://github.com/llvm/llvm-project/issues/57589 (LLVM)

>>camel-+13
That or a scoped optimization directive. GCC does allow `__attribute__((optimize("-ffast-math")))` as a function-wide attribute, but Clang doesn't seem to have an equivalent and the standard syntax `[[gcc::optimize("-ffast-math")]]` doesn't seem to work as well. In any case, such optimization should be visible from the code in my opinion.

replies(1): >>lorenz+0X

>>sevagh+c5
That sounds like trying to run a program to check if it has underground behavior. How do you make a test that's comprehensive and future-compiler-safe?

>>lifthr+KB
The problem is that it takes a single piece of code compiled with -ffast-math to break everything, it's simply not worth it

replies(1): >>maskli+aZ

>>lorenz+0X
GP seems to be saying that you can flag individual functions in GCC, thereby avoiding this issue: only flagged functions would be compiled with fast math semantics.

replies(1): >>lorenz+h01

>>maskli+aZ
This only work if it's a leaf function that will throw away the result. If you feed the result of your --fast-math function into other working code you risk breaking it.

replies(1): >>lifthr+F41

>>lorenz+h01
`-ffast-math` is fully local, asides from GCC's unexpected `crtfastmath.o` linkage which is global.

Functions with `-ffast-math` enabled still return fp values via usual registers and in usual formats. If some function `f` is expected to return -1.0 to 1.0 for particluar inputs, `-ffast-math` can only make it to return 1.001 or NaN instead. If another function without `-ffast-math` expects and doesn't verify f's return value, it will surely misbehave, but only because the original analysis of f no longer holds.

`-ffast-math` the compiler option is bad because this effect is not evident from the code. Anything visible in the code should be okay.