zlacker

[return to "Std: Clamp generates less efficient assembly than std:min(max,std:max(min,v))"]
1. jeffbe+Cj[view] [source] 2024-01-16 13:32:55
>>x1f604+(OP)
Clang generates the shortest of these if you target sandybridge, or x86-64-v3, or later. The real article that's buried in this article is that compilers target k8-generic unless you tell them otherwise, and the features and cost model of opteron are obsolete.

Always specify your target.

◧◩
2. joseph+sq[view] [source] 2024-01-16 14:20:15
>>jeffbe+Cj
Yep. Adding "-C target-cpu=native" to rustc on my desktop computer consistently gets a ~10-15% performance boost compared to the default target. The default target is extremely conservative. As far as I can tell, it doesn't take advantage of any CPU features added in the last 20 years. (The k8 came out in 2003.)
[go to top]