Bridgman, I am sure if you do not know where the bottlenecks are, you(and other developers) have suspects. Is it in kernelmode or usermode part of the stack?
Bridgman, I am sure if you do not know where the bottlenecks are, you(and other developers) have suspects. Is it in kernelmode or usermode part of the stack?
The problem is that so far the test results aren't supporting our initial suspicions. Going in I think most of us suspected that the bottlenecks were likely to be in the kernel driver (synchronization, memory mapping etc..) but test results seem to suggest that common mesa code in the usermode 3D driver is a bigger factor. There's a lot more testing required though, and there are conflicting views re: how to interpret the test results so far.
Performance optimization is basically :
- run some benchmarks & save the results
repeat forever {
- do some profiling
- form a theory re: where the bottleneck is
- change some code to test the theory
- re-run the benchmarks to see if things go faster
- (4 times out of 5) curse and discard the theory (or save as the basis for a more complex theory)
- (1 time out of 5) make happy noises and get some sleep
}
Does the Radeons offer debug instrumentarium, like the CPUs. Cache misses, instruction counting, pipeline stalls. Or bassicaly is there way to know that your shader compiler is bad, and thus resulting stalls on more SIMDs than needed. In general how you evaluate FOSS r600 shader compiler? Dismissing shader compiler as the main bottlenck, open spaces for more agressive CPU micro optimizations, e.g: branch prediction hints, preventing CPU cache trashing etc. Well that will break portability to other platforms, but I am sure fglrx is full of that.
Yeah, I would have been happier if it was the other way
Part of the problem is that profiling only tells you what the CPU is doing, not what the GPU is doing.
Drago, there are some hardware bits that can help but they're mostly aimed at getting the most out of the GPU once core driver isses are worked out, don't think they will help much here but we are going to look at those as well. Right now the open source driver hacked to not run anything on the GPU is still slower than fglrx doing full rendering (even on a single CPU core, apparently).
The "good" news (such as it is) is that this means there is a bunch of useful work that can be done before getting into the nasties of performance tuning on a pile of asynchronous engines (CPU execution, CPU cache flusher thingy, command processor, graphics pipe, shader core, vertex fetcher, texture fetchers/filters, GPU memory controller, GPU cache flusher thingy etc...).
If most of this work is in core Mesa (as I understood it, perhaps incorrectly), then all drivers will profit from this work?