This looks a lot more promising than Clover, which marks the sad state of open-source OpenCL.
Phoronix: Open LLVM-Based Portable OpenCL Announced
There's a new open-source OpenCL project called "Portable OpenCL" that takes advantage of LLVM and this morning marks its first public announcement...
http://www.phoronix.com/vr.php?view=MTAwMjM
This looks a lot more promising than Clover, which marks the sad state of open-source OpenCL.
Hello,
I'm the developer of Clover and this new project seems interesting. I had a quick look at the source code and it is very different of mine, but very interesting.
I especially appreciate the way the author of this new project implemented two LLVM passes to handle the barrier() calls. The problem is that he "unrolls" the work-items and execute them one after the other, without (it seems) any threading. His code may be faster than mine, but less scalable. Mine is more concise (20 lines to handle barrier()), but less elegant regarding how the stack is handled.
I think our two projects are nearly at the same state, we handle the full OpenCL API, but we lack the built-ins. I will hopefully have more time to work on Clover in the following days, and adding a built-in is as simple as adding lines like this in src/runtime/builtins.def :
This implements the fmin() and cos() built-ins for any scalar or vector float type. cos() uses the STL functions to calculate a sinus. This special code goes through a python script that duplicates it for every $gentype, and put declarations and code everywhere it is needed.Code:func $type fmin $gentype : x:$type y:$type return (x < y ? x : y); end # Native functions are implemented in C++ and are passed to the OpenCL kernels through src/core/cpu/builtins.cpp. native float cos float : x:float return std::cos(x); end native $type cos $vecf : x:$type for (unsigned int i=0; i<$vecdim; ++i) result[i] = std::cos(x[i]); end
are either of these at the stage where they can be used for writing multi-threaded code for a multi-cored CPU? i.e. can I do what I can currently do with openMP?
if so would it be wise to start shifting openMP code to openCL. will it run at the same speed now, and much faster one day iin the future when i can be compiled for a GPU.
Hi all,
I am one of the pocl developers, so maybe I can clarify a little
steckdenis: yep we fully unroll the work-items now. The passes basically create a big function which is a work-group, containing all the work-items. The goal is to express all the static parallelism inside a work-group to the code generator (this is good for multi-issue architectures but even scalar schedulers can benefit from it). You can still create one thread per work-group if you want, for example to target multi-core.
The main drawback of the full unrolling is the code size explosion for large work-group sizes. Change to "loop" instead of "unroll" is in the TODO (one previous "incarnation" of the passes used to do that, but we changed in this version for code clarity).
ssam: the "native" target, which you would use to run your kernels in the CPU, is not multi-threaded. However a "native threaded" target is in the sort-term TODO and might be available soon.
I just created a mailing list, pocl-devel@lists.sourceforge.net, so you might want to subscribe to be informed of development progress.
Carlos