Uh, yes. First you need to read from texture memory and into the GPU itself during rasterization. The GPU is also writing to the front buffer. Then, after the buffers are flipped, the (now) back buffer has to be encoded and sent to your monitor. So at a very minimum you have thousands of writes and even more reads for just that one rectangle.
I think you're getting memory bandwidth confused with DMA bandwidth on your PCI-E bus. Not the same thing.
My test code is really basic but if anyone is interested I can post it. I posted an EGL version to the mesa list. Add it to the PTS perhaps? :-)
I expected worse results after seeing the bug report about Unigine Heaven. Anyway, we don't have many options at the moment (I see only one: reverting the commit). The mechanism that decides where buffers are placed (VRAM or GTT) and which buffers are moved when we start to run out of memory must be overhauled. This is a bigger project and I don't have time for it right now. The kernel DRM interface might need some changes. We also need good tools to detect bottlenecks and a good GPU resource monitor. Right now if you run out of GPU memory, there's no easy way to know and definitely no way to know what is eating the memory. We're mostly blind right now.
However, we're fighting a battle we can't win. S3TC textures need 4x to 8x less memory and would help a lot with this problem. Any driver with S3TC support has a great advantage over a driver without one.
We could also cheat by using the BC7 format for plain RGBA8 textures. That would be a win if we implemented the BC7 encoding on the GPU.
This problem reminds me of a similar problem with r300g. Hopefully a way to improve performance without regressions will be found.
And anyway thanks Marek for all this endless work on the radeon drivers.
So would I be correct in thinking that this performance regression for Heaven/ETQW/etc *might* only affect users who haven't enabled S3TC through the external libtxc_dxtn library? Or, I guess users who are using it with applications that don't support it (or applications that just require gobs of memory capacity).
@marek
i think current distros already preinstall s2tc - ubuntu 12.10 as well (not sure for 32 bit however). in a trial image kanotix dragonfire has it preinstalled as well (32+64 bit lib).