Very nice. Basically SNA = Good, UXA = Bad. When SNA has regressions they are negligible, but when it performs better, it really performs a lot better.
Right, the regressions tend to be a consequence of choosing one method that gives the better performance elsewhere at a cost. Most of the regressions are in the noise of the measurement, IvyBridge is very sensitive to thermals (in some of those tests the initial run is 2x faster than the final run due to turbo). The only significant regression there is -compwinwin500. The reason for the regression is that last week it was 2x faster due to hitting the Render cache - however that was missing a flush. Having added that flush for correctness, it becomes faster to use the BLT for that particular test, a trivial change already made.
But what I find truly fascinating is how competitive we actually are with a discrete GPU that has a good driver, over 4x the fill rate of the igfx and several times the shader flops. With regards to 2D performance the limitation tends not to be SNA (unlike UXA and glamor where they are the bottleneck), but the application - which is as it should be. :-)
Is this statement related to intel hardware only or do you think there are general (significant) bottlenecks connected to Glamor?
There is a significant impedance mismatch between X and GL, that is tricky to overcome and adds lots of extra complexity, and with the extra abstraction layer you cannot exploit hardware features not exposed through a GL extension. Also you need to leak many details through that abstraction layer in order to allocate shared objects between multiple clients and your acceleration routines (which is quite, quite scary and hairy.) And there is the tiny issue of having a critcal system process relying on several hundred thousand lines of code that has not been written with robustness in mind, and having no failsafe method.
With regards to performance, the current bottlenecks I see in glamor are due to the CPU overhead of the Intel mesa stack, and the many assumptions that interact extremely poorly with the 2D workload of glamor. Where you do find yourself mostly GPU bound (such as the fish-demo), glamor still falls short by 10-30% due to inefficiences in the GPU programming (too many state changes and poor optimisation of shaders) and the multiple abstraction layers. However, being GPU bound is the exception and typically you end up being ratelimited by one of the paths that are orders of magnitude slower. And then there is the issue that glamor is an absolute resource hog, as the intel mesa driver's buffer management has never been used like that before...
In a perfect world, glamor would equal the performance of a highly specialised driver like SNA; much of the routines used in SNA can be mapped directly onto the OpenGL API - and most have been copied over to glamor. Lots of work needs to be done to tune the entire mesa stack, a lot of which I suspect will only benefit glamor.
And remember, RENDER acceleration is just one small part of the driver.