Quote Originally Posted by bridgman View Post
Running a kernel under OpenCL is pretty similar to running a pixel shader program under OpenGL -- the app says "for every pixel run this program", then throws triangles or rectangles at the GPU. The GPU then runs the appropriate shader program on every pixel, and on modern GPUs that involves running hundreds of threads in parallel (an RV770 can execute 160 instructions in parallel, each doing up to 5 floating point MADs, or 10 FLOPs per instruction).

The per-pixel output from the shader program usually goes to the screen, but it could go into a buffer which gets used elsewhere or read back into system memory. The Mesa driver runs on the CPU but the shader programs run on the GPU.

Same with OpenCL; driver runs on the CPU but a bunch of copies of the kernel run in parallel on the GPU. The key point is that the GPU is only working on one task at a time, but within that task it can work on hundreds of data items in parallel. That's why GPUs are described as data-parallel rather than task-parallel.

The data-parallel vs task-parallel distinction is also why the question of "how many cores does a GPU have ?" is so tricky to answer. Depending on your criteria, an RV770 can be described as single-core, 10 core, 160 core or 800 core. The 10-core answer is probably most technically correct, while the 160-core answer probably gives the most accurate idea of throughput relative to a conventional CPU.

Anyways, since a GPU fundamentally works on one task at a time and the drriver time-slices between different tasks it should be possible to hook into the driver and track what percentage of the time is being used by each of the tasks. That hasn't been useful in the past (since all the GPU workload typically comes from whatever app you are running at the moment) but as we start juggling multiple tasks on the GPU that will probably become more important (and more interesting to watch ).
Okay, so a pixel shader is more or less an infinite while-loop?

So if we have OpenCL into play, does that mean, that the OpenGL and OpenCL driver schedule which turn it is to get data processed, as the GPU only can handle one task at a time?

Let's say I write a OpenCL program that simulates a flow. Is that program the kernel for the GPU? Or is the kernel something Mesa would write to intercept my flow simulation program?

How many kernels can the GPU have running?