I've been pushing to github since I started work in late October:
http://github.com/awatry/libvpx.opencl
The bound copy of my thesis is due in 3 weeks, final draft 3/31 or 4/1, don't remember which.
I've only had Nvidia hardware to test on since my Radeon 4770 doesn't support the byte_addressable_store extension (5000-series and up only), but it runs on my GF9400m and a GTX 480 in current Ubuntu just fine. It also works fine on AMD Stream CPU-based OpenCL. I've gotten it working in Mac OS using CPU CL, but there's a bug in the Mac GPU-based acceleration that kills it every time and I haven't had time to track it down yet.
Like I said, I'm hoping to keep working on this after graduation, either as a hobby, or professionally if someone's willing to pay. I've gotten the OpenCL initialization framework in place, have all of the memory management taken care of, and have most of the major parts of the decoding available as CL kernels.
The next step that needs to be done is increasing the parallelism, as I'm currently capping out at 336 threads max, and the common case is only a few dozen threads, not enough to even approach achieve performance parity with the CPU-only paths. I've figured out a few ways to do that, especially in the loop filter (which accounts for 50% or so of the CPU-only execution time on a few of the 1080p videos I've profiled ). The sub-pixel prediction/motion compensation and Dequantization/IDCT will take a bit more work to thread effectively, but I think it can be done.



Reply With Quote
