Hmm. On my HD5970 for SmallPT 1.6 GPU Caustic3, I'm getting 45200 KSamples/sec on the GPU, and ~16000 KSamples/sec on my Core i7 920. Neither part is overclocked; they're at their factory default clock rates.
The GPU number is lower than either of Michael's radeons, but still a ways faster than Michael's GT 240. The numbers seem unaffected by whether compiz is on. I find it hard to accept that a HD5970 gets poorer results than a 5770. Even if a HD5970 is two 5850 cores together, shouldn't even one of those cores single-handedly outperform a 5770? And wouldn't OpenCL have the smarts to use both cores automatically to make it nearly twice as fast?
I noticed something funky about the tests, though. When the test is running, the output visual says at the bottom something like 52000K samples/sec. This is substantially larger than the 45000 Ksamples/sec reported by PTS in the output. I'm not sure why such the large discrepancy. Bug in PTS? Bug in the test?
Either way, it seems (disappointingly) that a HD5970 is only 3 times faster at this test than a Core i7? It is probably more economical to use a bunch of CPUs than to use GPUs for this kind of workload, seeing how a Core i7 is much cheaper than a dual gpu HD5970. We already know from other tests that a GPU is many, many, many times faster than the CPU at OpenGL 3d rendering, so maybe the parts needed for general purpose GPGPU are kept to a modest level on Evergreen in order to support top-of-the-line 3d graphics. I'm not complaining, since I don't use GPGPU for anything other than PTS