Fortunately I don't have to test this again. I ripped some pages out of the lecture I heard about it and combined them for you:
http://www.uploadarea.de/files/2g1ae...gv83mxiyyj.pdf
It's in german, but that's not that important. There is (some part) of the actual host programm and the OpenCL kernel. On the last two pages you'll find some graphics, which show how the (very) simple kernel performs against a sequential CPU proramm, one with PThreads (4 core machine with HT), OpenCL CPU and OpenCL GPU. The last page shows the perforamnce of the GPU kernel after some optimizations (better usage of the memory).
(Only the most simpel kernel is one these pages, not the optimized one)
The y-Axis shows the time in seconds. The program was tested with a 1680x1680 matrix on a i7 920 and a Geforce 9800 GX2
FYI: The lecture I attended isn't publicly online anymore, but the most recent one is: http://pvs.uni-muenster.de/pvs/lehre...vorlesung.html (german)



Reply With Quote