Glibc Picks Up Some More FMA Performance Optimizations
The GNU C Library, glibc, has picked up support for some additional functions as FMA-optimized versions.
The newest functions now getting the fused multiply-add (FMA) support are powf(), logf(), exp2f(), and log2f(). The FMA instruction set is present since Intel Haswell and AMD Piledriver generations and like past FMA optimizations, the benefits can be quite noticeable.
The FMA written powf() function on Intel Skylake hardware is yielding a 29% improvement in reciprocal throughput and 24% lower latency for SPEC2017. The log2f() call meanwhile is seeing a 17% throughput improvement and 18% improvement in latency. The logf() function is seeing a 16% throughput improvement and 22% reduction in latency. Lastly, exp2f() is 16% faster and 18% improvement in latency.
These optimizations were done by H.J. Lu and are available via Git until the upcoming glibc 2.27 release.
H.J. Lu also made some improvements by replacing some Assembly versions of functions with generic code and found that it's yielded a performance improvement with the C code over the older Assembly code. Some of the performance improvements are even more profound that the FMA optimizations.
The newest functions now getting the fused multiply-add (FMA) support are powf(), logf(), exp2f(), and log2f(). The FMA instruction set is present since Intel Haswell and AMD Piledriver generations and like past FMA optimizations, the benefits can be quite noticeable.
The FMA written powf() function on Intel Skylake hardware is yielding a 29% improvement in reciprocal throughput and 24% lower latency for SPEC2017. The log2f() call meanwhile is seeing a 17% throughput improvement and 18% improvement in latency. The logf() function is seeing a 16% throughput improvement and 22% reduction in latency. Lastly, exp2f() is 16% faster and 18% improvement in latency.
These optimizations were done by H.J. Lu and are available via Git until the upcoming glibc 2.27 release.
H.J. Lu also made some improvements by replacing some Assembly versions of functions with generic code and found that it's yielded a performance improvement with the C code over the older Assembly code. Some of the performance improvements are even more profound that the FMA optimizations.
10 Comments