#33: Nick Piggin (
npiggin) (2009-07-20 18:05) [reply]I'm actually of the opinion that we should disable optimize for size in our server kernel as well. I will try to recall the particular sles bug report I have with some numbers, but we have an ISV customer doing some virtual memory intensive workloads (basically mmap/page fault/munmap) and they found their real world performance is improved very significantly by using -O2 in SLES11. I can't remember exactly, but it is several 10s of % IIRC.
The reasoning for -Os in the kernel has seemed a bit flawed to me (as I have written other times before). icache issues are almost no different in userspace applications or libraries. There will always be various combinations of uncommon, common, large, small code being run -- the gcc guys are presumably always trying to make good tradeoffs based on that, and "performance" for them is including icache misses. Specifying -Os would seem to tell gcc that we care more about just binary size rather than actual performance.
If the kernel has commonly used code, we absolutely want it to be optimized as highly as possible. Uncommonly used code sure would be nice to reduce in size, but if it is uncommonly executed then by definition it should have smaller (temporal) icache footprint.
Now I don't have any numbers or reason to believe -O2 should lead in the desktop flavour -- unless like a staging step to enable it in the server flavour. I don't know of desktop workload where the kernel is going to be very costly, but actually I don't really profile 3d rendering which is one thing that might benefit from -O2. If anyone is gathering these kinds of framerate numbers, then it would be very interesting to test the difference between -Os and -O2.
- #35: Nick Piggin (npiggin) (2009-07-21 12:06) [reply] OK it is SLES bug 482887 . ISV reports VM intensive microbenchmark slows down by about 45%, and real world (for them) performance by 10-20% by using -Os rather than -O2 in SLES11 kernel.
- #36: Nick Piggin (npiggin) (2009-07-22 08:36) [reply]We have another result from a hardware vendor showing an important database workload is actually improving by 1% (system-wide throughput) by compiling kernel with -O2 rather than -Os. Their result is a little sparse on details, and I can't share some details to public, but I think it is a meaningful result.
- #37: Takashi Iwai (tiwai) (2009-07-22 16:15) [reply]IIRC, the decision for -Os was due to a significant performance difference on PowerPC a few years ago. There was little difference between -Os and -O2 on x86, thus we chose -Os.
But, your current number is much more convincing. We should go for -O2 indeed.