I would like to know the status of the radeon driver for getting any kind of video processing offloaded from the CPU to the GPU. It seems kind of silly to have to have a high end CPU to do this kind of work when there is a perfectly good GPU sitting there doing nothing. I find no major problem in decoding videos up to 1280x720, but when trying to decode 1920x1080, the CPU gets pegged and the playback becomes quite choppy. This is with an X2-3800 and have a Radeon 3650.
First, I guess I should make sure you are already making use of the existing video processing, ie going through the Xv interface to offload scaling, colour space conversion and filtering to the GPU. If you're not running with accelerated Xv today that should definitely be the first step.
Re: offloading the remaining video processing ("decode acceleration"), there are two things that have to happen first :
1. Either developers need to be willing to write all of the acceleration code in a hardware-dependent way (as was done for EXA and Xv) or a suitable framework needs to be implemented.
2. A decision needs to be made about how to hook the acceleration code into the playback stack. This is the more significant obstacle IMO. There are a number of decode APIs which offer multiple entry points including ones which map well onto generic GPU capabilities (eg starting with MC) but I don't believe anyone has looked at modifying an existing decode stack to hook into one of those lower level entry points for HD decode.
It might seem that using a pre-existing slice-level API is the obvious approach, but that means a lot of complex decode functionality would need to be implemented in the driver in software since the first implementations are likely to focus on what can readily be done with shaders and that implies the line between CPU and GPU be lower than slice-level.
Given that, the approach that seems to make the most sense is to hook into an existing open source decode library and add hooks to either use an MC-level decode API or to add the shader code directly to the library using an API like Gallium3D. I haven't looked at the existing decode libraries to see how hard it would be to hook in an MC-level decode API (eg VA-API) but if that did turn out to be relatively clean (ie if the VA-API interface mapped cleanly onto the code in the decode library) then it might be feasible to implement something without waiting for Gallium3D.
The "most likely to happen" approach is implementing decode acceleration over Gallium3D, since that provides a relatively vendor-independent low level interface for using the 3D engine. Once the "classic mesa" implementation for 6xx/7xx 3D is stabilized I think you will see focus shift almost immediately to porting that code across to a Gallium3D driver. This approach (implementing Gallium3D first then building decode acceleration on top) is what most of the community developers seem to be favoring today.
HW information to implement shader-based decode acceleration has been available for ~9 months on 6xx/7xx and ~18 months for earlier GPUs, so it's probably fair to say this is not a top priority for other users interested in becoming developers. In the meantime, if you have a multicore CPU there are multithreaded implementations of the current decode stack available and they seem to help a lot.
Last edited by bridgman; 08-29-2009 at 10:57 AM.
Well, I meant XvMC actually with the MC I mentioned earlier. I got the impression there already is a state tracker for it in Gallium3D and we just need a working r600g driver to tap into it. ^^ Actual decoding acceleration over VA-API or whatever else will require quite a lot more work though...
Definitely I'm using Xv. Not a newb here
I assume that by "multithreaded implementations of the current decode stack" you are referring to ffmpeg-mp. I have had a look at that, and it did help, but at this point, I've had to resort to dropping the $15 for a coreavc license. With that its still struggling, but at least the video is watchable.
I have to admit that most of your post went way over my head. I am a computer engineer myself, but no experience at all in graphics driver or video processing development. From what I can gather though, seems to me that there is a while to wait yet.
Thank you for your response.
I'm saying "we don't know yet, so assume the answer is no unless/until you hear otherwise". In the meantime, decode acceleration with shaders is moving ahead. Even if we opened up UVD tomorrow we would need shader-based decode acceleration anyways, since only the more recent GPUs (everything after the original HD2900) include the UVD block.
I'm pretty sure that all of those features existed before VDPAU came along, and that code exists to implement them using existing APIs such as OpenGL.
XvMC has all kinds of limitations including being designed around MPEG-2 standards -- the reason for doing XvMC first is simply because a lot of the code is already there. This allows the developers to concentrate on getting a Gallium3D driver working to complete the stack. Once XvMC-over-Gallium3D is running the GPU-specific work will be largely done, and support for other APIs and video standards will be much easier to add.
The quick answer is "we'll know for sure when the code is written", but I expect shader-based decode will use more power and CPU than UVD-based decode. The important question is whether it will use enough extra power to really matter for most users, and I suspect the answer is "no".
I have been recommending something a bit more powerful than the very low end products to make sure the GPU has enough shader power for decode acceleration, ie going for something like an HD2600/HD3650 just to be safe -- at least until the shader-based decode stack is running well for most users.
The rv710 has 2X the shader power of the rv610/620 so that advice may no longer be relevent.
Last edited by bridgman; 09-18-2009 at 02:59 PM.
El-super-cheapo HD3650 anyone? http://www.newegg.com/Product/Produc...lor-_-14131084
rv610/620 - 40 ALUs (2 SIMDs x 4 pixels/vertices per SIMD x 5)
rv710 - 80 ALUs (2 SIMDs x 8 pixels/vertices per SIMD x 5)
rv630/635 - 120 ALUs (3 SIMDs x 8 pixels/vertices per SIMD x 5)
rv670 - 320 ALUs (4 SIMDs x 16 pixels/vertices per SIMD x 5)
rv730 - 320 ALUs (8 SIMDs x 8 pixels/vertices per SIMD x 5)
rv740 - 640 ALUs (8 SIMDs x 16 pixels/vertices per SIMD x 5)
rv770 - 800 ALUs (10 SIMDs x 16 pixels/vertices per SIMD x 5)
No problem![]()
Last edited by bridgman; 09-18-2009 at 02:55 PM.
If I was going to spend ~$50 and get an ATI card with a tiny, whiny, 40mm fan, I would move up to the RV730 class: RadeonHD 4650 - http://www.newegg.com/Product/Produc...82E16814102829
Personally, I opted for an RV710/4550 with passive cooling because I'm a neurotic lover of silent computing (and not much of a gamer).