zlacker

[parent] [thread] 6 comments
1. jeffbe+(OP)[view] [source] 2024-01-16 13:32:55
Clang generates the shortest of these if you target sandybridge, or x86-64-v3, or later. The real article that's buried in this article is that compilers target k8-generic unless you tell them otherwise, and the features and cost model of opteron are obsolete.

Always specify your target.

replies(2): >>joseph+Q6 >>x1f604+eU6
2. joseph+Q6[view] [source] 2024-01-16 14:20:15
>>jeffbe+(OP)
Yep. Adding "-C target-cpu=native" to rustc on my desktop computer consistently gets a ~10-15% performance boost compared to the default target. The default target is extremely conservative. As far as I can tell, it doesn't take advantage of any CPU features added in the last 20 years. (The k8 came out in 2003.)
replies(2): >>jeffbe+f8 >>wongar+dd
◧◩
3. jeffbe+f8[view] [source] [discussion] 2024-01-16 14:28:12
>>joseph+Q6
Those Gentoo people were onto something.
replies(2): >>alexey+Be >>skykoo+771
◧◩
4. wongar+dd[view] [source] [discussion] 2024-01-16 14:54:24
>>joseph+Q6
Red Hat Enterprise Linux has upgraded their default target to x86-64-v2 and is considering switching to x86-64-v3 for RHEL 10 (which should release around 2026?). I'd take that as a sign that those might be reasonable choices for newly released software.

Some linux distros also give you the option to either get a version compatible with ancient hardware or the optimized x86-64-v3 version, which seems like a good compromise.

◧◩◪
5. alexey+Be[view] [source] [discussion] 2024-01-16 15:01:49
>>jeffbe+f8
Funny that it stopped being the case for a while around 2006. AMD64 became widespread while also being very new, closing the gap between "default" and "native".
◧◩◪
6. skykoo+771[view] [source] [discussion] 2024-01-16 19:01:17
>>jeffbe+f8
Of course, gentoo just started using prebuilt packages a few months ago…
7. x1f604+eU6[view] [source] 2024-01-18 09:07:03
>>jeffbe+(OP)
Even with -march=x86-64-v4 at -O3 the compiler still generates fewer lines of assembly for the incorrect clamp compared to the correct clamp for this "realistic" code:

https://godbolt.org/z/hd44KjMMn

[go to top]