Open source driver seems roughly the same level of performance
as open source driver for Intel's SB graphics. I guess one should
really use Catalyst. If your card is supported, though.
I see. What I am wondering about is why do graphics card drivers require this amount of manpower? Is it the hardware or the API? How is it possible to write driver code that could be more than ten times faster? Or is it that the hardware doesn't map to the exposed API(OpenGL) and requires complex translation? Sorry if the questions sound silly, I've never worked with bare hardware.
Last edited by log0; 04-16-2012 at 08:56 AM.
Open source driver seems roughly the same level of performance
as open source driver for Intel's SB graphics. I guess one should
really use Catalyst. If your card is supported, though.
N.B. I'm not a driver developer, just an interested observer
In this case, the open source driver was forced into a low-power mode, while the proprietary driver was going on at full blast. Also, it's possible that not all functionality (such as tiling) was enabled on the open source driver. When there's an order-of-magnitude difference, then either something is wrong, or the driver is too new and there's lots of work needed still.
The problem with OpenGL drivers (and GPU drivers in general) is that they are amazingly complex hardware that takes incredible amounts of code (especially full OpenGL support). It's much more complex than a network card driver or a mouse driver. With most chips, the Gallium3d drivers for radeons are around 60-70% of the proprietary driver, which is as close as you can get with "regular" effort.
Then the things get complicated. A GPU driver runs on the CPU and often has to do many things before it can prepare a frame for rendering. If it is not optimised, then the time adds up, lots of little delays all over the stack, which need to be optimised one-by-one, hundreds of them. This is very time-intensive and takes a lot of manpower. If you are running something at 100 frames per second ,then this quickly adds up and makes a huge difference. Even a small delay multiplied by 100 becomes a long wait. That's why the developers are first focusing on getting a driver working correctly, and only then try to optimise it.
With some work, and Tom's VLIW packetiser and the new shader compiler, and the Hyper-Z support, things should come to more than 80% of the proprietary performance, perhaps even more (rough guess). That's really good, and the additional work after than becomes too complex, with very little gain.
From my A6-3500 series (via ssh, the machine is currently idle sitting at a mythtv front end screen):
me@mybox:/sys/class/drm/card0/device# cat /sys/class/drm/card0/device/power_method
profile
me@mybox:/sys/class/drm/card0/device# cat /sys/class/drm/card0/device/power_profile
default
me@mybox:/sys/kernel/debug/dri/0# cat /sys/kernel/debug/dri/0/radeon_pm_info
default engine clock: 200000 kHz
current engine clock: 11880 kHz
default memory clock: 667000 kHz
So an order of magnitude difference between Catalyst and r600g is to be expected if Michael left the power management in its default state. If he forced the APU under Gallium3D into high performance mode (or maybe dynpm profile), things would have probably been different.
I'm not positive about how the default clocking on the APUs work, but I'm seeing some variation in the GPU clock on my machine. It goes as low as 7Mhz and as high as 30Mhz when idling, and I'm not sure how conservative the reclocking (which seems enabled by EFI/BIOS by default) actually is. So forcing the APU to high-performance mode might help things.
Yes, the translation is really complex as far as OpenGL is concerned. Implementing a performant shader compiler is also not easy. Then, there are hardware optimizations which you can use, like texture tiling, hierarchical Z-Stencil buffers, colorbuffer compression, etc.
We need a driver which:
1) doesn't starve the GPU by doing too much CPU work
2) doesn't synchronize the CPU with the GPU, so that the two can operate asynchronously
3) takes advantage of every hardware feature which improves performance
FYI, I was told by some NVIDIA guy face-to-face a few years ago that their Vista GPU drivers had 20 million lines of code. The entire Linux kernel has only 14.3M.
FWIW, we have hundreds of developers working on the closed source AMD driver and the closed driver was ~40 million LOC last time I checked which was a while back.
I don't understand why people believe shader optimization is a big issue. On all the benchmark in this article a better shader would most likely wouldn't make a mesurable differences. Marek has far better point to explain the gap.Oh and if you want to convince yourself that shader is not a big issue, take a big shader of doom3, do a sample gl program that use that shader to draw quad covering biggest fbo possible on your generation, draw thousand of time, then hand optimize the shader and hack r600g to use your hand optimized version. Compare, last time i did such things the difference wasn't that big.
glisse it's because nouveau is faster than radeon: considering nouveau isn't backed by nvidia and there isn't any documentation that's quite strange and peoples started searching for a culprit.
@glisse
Do you mean TGSI or r600 asm?
My GSOC shader (TGSI) was 20% faster when hand-optimized compared to Mesa's GLSL compiler. But that's only at TGSI level, I believe it would be much faster if properly compiled (maybe wrong word) down to r600g asm instead of the simple replacement that I understand is the current status.