zlacker

[return to "Std: Clamp generates less efficient assembly than std:min(max,std:max(min,v))"]
1. jeffbe+Cj[view] [source] 2024-01-16 13:32:55
>>x1f604+(OP)
Clang generates the shortest of these if you target sandybridge, or x86-64-v3, or later. The real article that's buried in this article is that compilers target k8-generic unless you tell them otherwise, and the features and cost model of opteron are obsolete.

Always specify your target.

◧◩
2. joseph+sq[view] [source] 2024-01-16 14:20:15
>>jeffbe+Cj
Yep. Adding "-C target-cpu=native" to rustc on my desktop computer consistently gets a ~10-15% performance boost compared to the default target. The default target is extremely conservative. As far as I can tell, it doesn't take advantage of any CPU features added in the last 20 years. (The k8 came out in 2003.)
◧◩◪
3. wongar+Pw[view] [source] 2024-01-16 14:54:24
>>joseph+sq
Red Hat Enterprise Linux has upgraded their default target to x86-64-v2 and is considering switching to x86-64-v3 for RHEL 10 (which should release around 2026?). I'd take that as a sign that those might be reasonable choices for newly released software.

Some linux distros also give you the option to either get a version compatible with ancient hardware or the optimized x86-64-v3 version, which seems like a good compromise.

[go to top]