What realy sucks balls about OpenCL is that you need to specifically target all kinds of different cards, even though your code will run on any OpenCL device. The problem is hardcore GPU understanding. For example the bank size and terminology is different between nVidia and ATi. Imagine programming soundcards >.<

Does the current IR succesfully work as a GPU design abstraction with Clover, so that Clover converts OpenCL in general code that works just as great on nVidia as ATi? That would be massive win all over the place.
as i understand it if you write code in openCL then it will work fine on ati, nvidia, multicore cpu etc. but if you want the code to super fast then you need to pay close attention to things like memory layout, shared caches, and other hardware dependant stuff, because memory bandwidth and cache misses can be significant. I think that tweaking would be very hard to automate.