Quote Originally Posted by alexThunder View Post
Well, actually they are, unless you got a problem which can not be parallelized.
They actually have most of that, i.e. pipelining.
most algorithms are very hard to parallelize and those parallel friendly algoritm need optimizations depending on the GPU and the dataset you use [<-- is a very hard task -- wanna a widespread reference google CABAC GPU]

they have them but very rudimentary and optimized for the GPU tasks so they differ quite a lot from a CPU counterpart, don't believe me try a matrix multiply[1000x1000 for example] 1 with branching and 1 without [pick the CL language you like] and check the time both take to complete [non branched wins for a factor of X times] so you see what i mean.

this is what i meaned when i say none is faster than the other they are different tools designed to atack efficiently very different scale of problems