is this SNA being done on the Ivy bridge CPU or the integrated HD4000 GPU?
It sounds like SNA does both.
It uses a bunch of complicated heuristics based on each GPU, that tells it whether to use the 3D engine, the 2D engine, or the CPU to do specific bits of rendering the fastest way, and then combines everything together.
How many graphics acceleration architectures does X.org have?
XAA, EXA, UXA, SNA, GLAMOR? Any more?
And how many of these are still in use?
Maybe it would be good if it had just one unified acceleration architecture?
1. XAA isn't in use (at all?) anymore. It may still be in use on VERY old distros or on BSD or Solaris, but I can't imagine it's being used on Linux no matter what driver. GLAMOR is so new/unstable/broken/slow that it isn't in use, either. UXA will die shortly once everyone migrates to SNA. EXA is a long-term holdout being used by r600g and nouveau, so it's not going anywhere. Going forward you will probably see only SNA and EXA being used, if GLAMOR never shapes up into a performant solution.
2. Everyone who's ever written an acceleration architecture has at least hoped that it would be a "unified acceleration architecture". The fact that none of them have worked out to actually be universal (except for EXA which, for a certain period of time, worked for ATI, Intel and Nvidia cards) shows that individual cards have diverging hardware making it difficult to create one efficient architecture fo rall cards.
IMHO the biggest discrepancy between cards is the memory model. You have at least these memory models:
1. Discrete GPUs have their own VRAM (usually a LOT of it), which is blazingly fast when accessed by the GPU but painfully slow when reading it from the CPU
2. IGPs on the motherboard use system RAM, but since integrated graphics before the advent of processor graphics (Sandy Bridge and later) is very slow, you can't make very many assumptions about the performance of IGPs at all
3. Processor-based graphics such as AMD Fusion and Intel's Sandy/Ivy Bridge have extremely fast memory read and write (low latency) to system memory, but unfortunately, system memory is much slower -- lower bandwidth -- than VRAM.
4. Hybrid models such as Nvidia Optimus and LucidLogix Virtu present their own performance characteristics.
Not only does the memory model require vastly different programming at the driver level to enable the functionality; it also affects the "cost" of certain operations. So, for a discrete GPU that is reading and writing between VRAM and the 3D engine without much interaction with the CPU, that's going to be REAL fast, because the GPU can access its VRAM faster than almost any other operation on your system except the CPU's L1/L3 cache. BUT, if your acceleration architecture ever makes the discrete GPU upload some data from its VRAM to the CPU, that can be a significant cost. But on the IGPs and processor-based GPUs, they wouldn't mind reading from "VRAM" (really, system RAM) at all. Better still, for Sandy Bridge and later, the memory controller itself is on the processor!
I think it makes sense to at a bare minimum have two acceleration architectures:
1. One that's optimized for discrete GPUs where GPU<->VRAM is cheap but CPU<->VRAM is expensive;
2. One that's optimized for integrated GPUs where GPU<->VRAM is about the same cost as CPU<->VRAM but neither one is fast enough to really compete with a discrete GPU.