One thing people really don't seem to realize about GPGPU processing is even if you have something cross-platform like openCL, the process you're running is heavily dependent upon the hardware, NOT the drivers. Drivers make a difference but there's only so much they can do. A good example of this is bitcoin mining - it's an incredibly simple calculation, so AMD's overall higher frequency and higher stream processor count has a noticeable impact on bitcoin mining versus nvidia.
I'm a bit surprised by these results to be honest ; some years ago with pre Fermi card it was necessary to design gpgpu kernel around coalesced memory access, and it was said that developper was forced to use float4 type everywhere in their kernel to get good performance from evergreen/cayman radeon cards.
Nowadays Kepler and GCN cards uses cached memory and have a scalar ISA, which should make them much more "coherent" from a performance perspective except in some niche aera (for instance integer are much faster on radeon than geforce), but it's not really apparent in these benchmark.