ORC Unwinder For Linux 4.14, Boosts Kernel Performance By Disabling Frame Pointers
Ingo Molnar submitted the Linux x86 Assembly updates today for the 4.14 merge window. What's interesting with the x86/asm code changes is the introduction of the ORC Unwinder.
The ORC Unwinder is a lightweight, Linux kernel debuginfo implementation with "DWARF done right for unwinding...
The ORC unwinder is almost two orders of magnitude faster than the (out of tree) DWARF unwinder - which is important for perf call graph profiling. It is also significantly simpler and is coded defensively: there has not been a single ORC related kernel crash so far, even with early versions. (knock on wood!)"
The ORC Unwinder alone isn't exciting or relevant for most of you, but by having ORC unwinder enabled, CONFIG_FRAME_POINTER can be disabled. CONFIG_FRAME_POINTER is enabled right now on most Linux distribution kernels, but disabling it can yield some performance advantages. The frame pointer option has the compiler add frame pointer code to every function of the kernel, but with it disabled, the kernel shrinks and yields better cache utilization and fewer instructions executed. System calls can be faster by 1~3% or by as much as 5~10% faster for some "function execution intense workloads."
More details via this pull request.
The ORC Unwinder is a lightweight, Linux kernel debuginfo implementation with "DWARF done right for unwinding...
The ORC unwinder is almost two orders of magnitude faster than the (out of tree) DWARF unwinder - which is important for perf call graph profiling. It is also significantly simpler and is coded defensively: there has not been a single ORC related kernel crash so far, even with early versions. (knock on wood!)"
The ORC Unwinder alone isn't exciting or relevant for most of you, but by having ORC unwinder enabled, CONFIG_FRAME_POINTER can be disabled. CONFIG_FRAME_POINTER is enabled right now on most Linux distribution kernels, but disabling it can yield some performance advantages. The frame pointer option has the compiler add frame pointer code to every function of the kernel, but with it disabled, the kernel shrinks and yields better cache utilization and fewer instructions executed. System calls can be faster by 1~3% or by as much as 5~10% faster for some "function execution intense workloads."
More details via this pull request.
40 Comments