Announcement

**davidbepo** · 01 December 2017, 06:57 AM

this driver might be faster and use opencl
interesting

**Krejzi** · 01 December 2017, 09:07 AM

Hmm, I see cmrt mentions in the README, yet current cmrt [1] has a binary only blob that is required for anything using cmrt and intel-hybrid-driver. I wonder if the shipped cmrt needs the jitter library as well, or it can function without it?

[1] https://github.com/01org/cmrt/tree/master/jitter

**schmidtbag** · 01 December 2017, 09:54 AM

Unless there's something I'm missing, I'm surprised OpenCL hasn't been involved much in video decoding. Seems like it'd make development much simpler, since it could eliminate the need for ASICs (even if integrated) and reduce hardware-specific drivers. Note: I am aware that OCL video encoders have been a thing for a while.

**Veerappan** · 01 December 2017, 11:46 AM

Originally posted by schmidtbag View Post

Unless there's something I'm missing, I'm surprised OpenCL hasn't been involved much in video decoding. Seems like it'd make development much simpler, since it could eliminate the need for ASICs (even if integrated) and reduce hardware-specific drivers. Note: I am aware that OCL video encoders have been a thing for a while.

Depending on what part of the decoding pipeline you're looking at, there's parts of video decoding that really don't parallelize well. The stream decompression, macroblock filtering, and iDCT steps are very, very branchy and have a lot of dependencies on the previous step that executed. When you get to the end-of-frame deblocking filtering, that's the part that parallelizes decently (I think I had it to the point that a 1080p VP8 video could have its loop filter split across ~192 threads). At least for VP8, that eliminated about 40-50% of the per-frame execution time, but the copies back and forth between the CPU and GPU never let the performance get back to the level of just running the decoding on the CPU alone (and nowhere near the speed of a dedicated ASIC).

Maybe if someone could find a way to tease out the dependencies in decoding earlier in the frame and keep all of the decompression/filtering/idct work on the GPU it would work, but you basically have to offload it all, or communication latency/bandwidth doesn't make it worth it.

**schmidtbag** · 01 December 2017, 12:17 PM

Originally posted by Veerappan View Post

Depending on what part of the decoding pipeline you're looking at, there's parts of video decoding that really don't parallelize well. The stream decompression, macroblock filtering, and iDCT steps are very, very branchy and have a lot of dependencies on the previous step that executed. When you get to the end-of-frame deblocking filtering, that's the part that parallelizes decently (I think I had it to the point that a 1080p VP8 video could have its loop filter split across ~192 threads).

I understand that, but OpenCL requires CPU involvement no matter what. So I don't see why the CPU couldn't do things like decompression and macroblock filtering, while the GPU does EoF deblocking filtering, rendering, and so on. This ought to be efficient enough where even a low-end processors won't struggle.

At least for VP8, that eliminated about 40-50% of the per-frame execution time, but the copies back and forth between the CPU and GPU never let the performance get back to the level of just running the decoding on the CPU alone (and nowhere near the speed of a dedicated ASIC).

Well for one thing, I'm not suggesting all codecs run with OpenCL; some were clearly designed with the intention of CPU decoding, and therefore would run more efficiently there. That being said, unless OpenCL negatively impacts the framerate, why does it matter if there is more back and forth communication? Unlike games, pre-encoded videos have a fixed framerate and don't require much user input (so latency is mostly irrelevant). So, even if the maximum framerate decreases while latency increases, the only thing that matters is if low-end hardware that struggled to play back the video can now do it smoothly.

**tildearrow** · 01 December 2017, 02:21 PM

Little typo:

Originally posted by phoronix View Post

Intel Graphics Memory Management LIbrary.

**Mthw** · 02 December 2017, 10:04 AM

As they created this new driver does it mean they are abandoning the current one? Do you also thing that they will include support for older GPUs like Haswell later?

**Guest** · 02 December 2017, 11:03 AM

Originally posted by Veerappan View Post

Depending on what part of the decoding pipeline you're looking at, there's parts of video decoding that really don't parallelize well. The stream decompression, macroblock filtering, and iDCT steps are very, very branchy and have a lot of dependencies on the previous step that executed. When you get to the end-of-frame deblocking filtering, that's the part that parallelizes decently (I think I had it to the point that a 1080p VP8 video could have its loop filter split across ~192 threads). At least for VP8, that eliminated about 40-50% of the per-frame execution time, but the copies back and forth between the CPU and GPU never let the performance get back to the level of just running the decoding on the CPU alone (and nowhere near the speed of a dedicated ASIC).

Maybe if someone could find a way to tease out the dependencies in decoding earlier in the frame and keep all of the decompression/filtering/idct work on the GPU it would work, but you basically have to offload it all, or communication latency/bandwidth doesn't make it worth it.

Well, if we had a scenario where steps 1,2,3 run on the CPU and steps 4,5,6 run on the GPU, and the final output frame only needs to be on the GPU (since it's going to be displayed), wouldn't that avoid copying back and forth between the CPU and the GPU? What about compositing, playing the video in a window, or say in a HTML 5 video element in a browser? Would that necessarily require moving the decoded video frame to the CPU memory, or could it stay in GPU memory and have the GPU do all of the compositing work?

**andre30correia** · 02 December 2017, 11:41 AM

probaly will see some features of windows finally in linux

Announcement

Intel Releases New Linux Media Driver For VA-API

Intel Releases New Linux Media Driver For VA-API

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment