Quote Originally Posted by elanthis View Post
Examples of such features recently brought up right on this very site are texture tiling, hierarchical Z-buffering, higher PCI transfer speeds, and power state control.
Like Alex says, most of this is finished for radeon cards. Perhaps not turned off by default on most distros, but it's been written.

And yes, optimizing shaders can bring huge gains, but Quake3 doesn't need much of that, and it's still slower.

You can also easily measure where the bottleneck lies. If your CPU is pegging 100%, the driver (or the app itself) is the bottleneck. If the GPU is pegging 100%, it's the hardware. Any recent tests I've seen have shown that the CPU usage with the FOSS drivers is not all that terrible (though not great) and yet the FPS is extremely worse. Clearly, then, the GPU hardware is the bottleneck; in this case, not because the hardware itself is bad, but because it's running in a race with its hands tied behind its back and one leg chopped off.
I don't know much about GPU drivers to argue with you about this. It will depend on how often the GPU has to wait for the driver to finish doing its stuff, and I can think of scenarios where this induces delays even when CPU is far from 100% load, I get this regularly when running OpenCL software. But like I said, I'm not a driver guy, so I have no clue how it is really done.