@mark
It's mainly about the inlining. Yes, it can have that big an effect.
C++ templates much exacerbate that effect, when you have templates calling templates calling templates, you can get thousands of pointless function calls without inlining.
Then why not use something like -march=i686 -msse -msse2? That would enable gcc to use cmov and sse/sse2 instructions and the binaries would still run on a P4.This change is mainly to benefit 32-bit systems where SSE support can't be assumed by default, but with the i965 driver, more often than not it can be assumed an Intel Core 2 processor or newer is in use. (The older Intel processors are generally using the i915 driver.) By setting the -march=core2 flag, for i386 builds SSE would now be used for floating-point math and cmov instructions, plus other performance optimizations.
[...]
This patch was ultimately rejected since it turns out there's still some old Pentium 4s that could be found in an i965 driver configuration where things might break.
@mark
It's mainly about the inlining. Yes, it can have that big an effect.
C++ templates much exacerbate that effect, when you have templates calling templates calling templates, you can get thousands of pointless function calls without inlining.
-Os is slower in some cases, tried it now on r200 and immidiately i can see slower menus in supertuxkart: going through kart chooser for example is slugish, so no go...
From my experiance maybe -O1 is the best for mesa stability, but safe is to just go with -O2 and -pipe that will produce smaller libraries or if you want to play with processor optimisation then add -march=blabla , but always stick with -O2 if you want and to keep driver stability.
Actually, since the functions code would be executed anyway, you should always gain performance from avoiding the new stack entry. The main drawbacks they try to avoid are probably bigger binaries, more memory usage for very large functions.
And it could potentially allow even more optimization with the "neighbouring" code, since it's not isolated in a function anymore. There way too many things to consider in compiler optimization.
It would be nice to have a database/list of programs and their fastest compile flags (depending on the compiler/version of course).
Question is indeed if mesa is speed limiting step (aka bottleneck) in the whole system here. But it won't hurt to keep my Gentoo CFLAGS like they are. Mainly march set and -O2. In few cases I actually use -Os for VIA CPUs or AMD's old Geode LX. Few packages might dislike messing too much with CFLAGS though.
Stop TCPA, stupid software patents and corrupt politicians!
My understanding is that right now the biggest bottleneck in the oss graphics stack is GEM/TTM. It needs replaced, but I don't think anybody has a good idea on what to replace it with.