Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: radeor video acceleration

  1. #1
    Join Date
    May 2009
    Posts
    49

    Default radeor video acceleration

    I would like to know the status of the radeon driver for getting any kind of video processing offloaded from the CPU to the GPU. It seems kind of silly to have to have a high end CPU to do this kind of work when there is a perfectly good GPU sitting there doing nothing. I find no major problem in decoding videos up to 1280x720, but when trying to decode 1920x1080, the CPU gets pegged and the playback becomes quite choppy. This is with an X2-3800 and have a Radeon 3650.

  2. #2
    Join Date
    Aug 2008
    Location
    Finland
    Posts
    1,578

    Default

    Quote Originally Posted by lbcoder View Post
    I would like to know the status of the radeon driver for getting any kind of video processing offloaded from the CPU to the GPU. It seems kind of silly to have to have a high end CPU to do this kind of work when there is a perfectly good GPU sitting there doing nothing. I find no major problem in decoding videos up to 1280x720, but when trying to decode 1920x1080, the CPU gets pegged and the playback becomes quite choppy. This is with an X2-3800 and have a Radeon 3650.
    No decoding acceleration or motion compensation for the moment. Probably won't be until Gallium3D.

  3. #3
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,425

    Default

    First, I guess I should make sure you are already making use of the existing video processing, ie going through the Xv interface to offload scaling, colour space conversion and filtering to the GPU. If you're not running with accelerated Xv today that should definitely be the first step.

    Re: offloading the remaining video processing ("decode acceleration"), there are two things that have to happen first :

    1. Either developers need to be willing to write all of the acceleration code in a hardware-dependent way (as was done for EXA and Xv) or a suitable framework needs to be implemented.

    2. A decision needs to be made about how to hook the acceleration code into the playback stack. This is the more significant obstacle IMO. There are a number of decode APIs which offer multiple entry points including ones which map well onto generic GPU capabilities (eg starting with MC) but I don't believe anyone has looked at modifying an existing decode stack to hook into one of those lower level entry points for HD decode.

    It might seem that using a pre-existing slice-level API is the obvious approach, but that means a lot of complex decode functionality would need to be implemented in the driver in software since the first implementations are likely to focus on what can readily be done with shaders and that implies the line between CPU and GPU be lower than slice-level.

    Given that, the approach that seems to make the most sense is to hook into an existing open source decode library and add hooks to either use an MC-level decode API or to add the shader code directly to the library using an API like Gallium3D. I haven't looked at the existing decode libraries to see how hard it would be to hook in an MC-level decode API (eg VA-API) but if that did turn out to be relatively clean (ie if the VA-API interface mapped cleanly onto the code in the decode library) then it might be feasible to implement something without waiting for Gallium3D.

    The "most likely to happen" approach is implementing decode acceleration over Gallium3D, since that provides a relatively vendor-independent low level interface for using the 3D engine. Once the "classic mesa" implementation for 6xx/7xx 3D is stabilized I think you will see focus shift almost immediately to porting that code across to a Gallium3D driver. This approach (implementing Gallium3D first then building decode acceleration on top) is what most of the community developers seem to be favoring today.

    HW information to implement shader-based decode acceleration has been available for ~9 months on 6xx/7xx and ~18 months for earlier GPUs, so it's probably fair to say this is not a top priority for other users interested in becoming developers. In the meantime, if you have a multicore CPU there are multithreaded implementations of the current decode stack available and they seem to help a lot.
    Last edited by bridgman; 08-29-2009 at 11:57 AM.

  4. #4
    Join Date
    Aug 2008
    Location
    Finland
    Posts
    1,578

    Default

    Quote Originally Posted by bridgman View Post
    I haven't looked at the existing decode libraries to see how hard it would be to hook in an MC-level decode API (eg VA-API)
    Well, I meant XvMC actually with the MC I mentioned earlier. I got the impression there already is a state tracker for it in Gallium3D and we just need a working r600g driver to tap into it. ^^ Actual decoding acceleration over VA-API or whatever else will require quite a lot more work though...

  5. #5
    Join Date
    May 2009
    Posts
    49

    Default

    Definitely I'm using Xv. Not a newb here
    I assume that by "multithreaded implementations of the current decode stack" you are referring to ffmpeg-mp. I have had a look at that, and it did help, but at this point, I've had to resort to dropping the $15 for a coreavc license. With that its still struggling, but at least the video is watchable.

    I have to admit that most of your post went way over my head. I am a computer engineer myself, but no experience at all in graphics driver or video processing development. From what I can gather though, seems to me that there is a while to wait yet.

    Thank you for your response.



    Quote Originally Posted by bridgman View Post
    First, I guess I should make sure you are already making use of the existing video processing, ie going through the Xv interface to offload scaling, colour space conversion and filtering to the GPU. If you're not running with accelerated Xv today that should definitely be the first step.

    Re: offloading the remaining video processing ("decode acceleration"), there are two things that have to happen first :

    1. Either developers need to be willing to write all of the acceleration code in a hardware-dependent way (as was done for EXA and Xv) or a suitable framework needs to be implemented.

    2. A decision needs to be made about how to hook the acceleration code into the playback stack. This is the more significant obstacle IMO. There are a number of decode APIs which offer multiple entry points including ones which map well onto generic GPU capabilities (eg starting with MC) but I don't believe anyone has looked at modifying an existing decode stack to hook into one of those lower level entry points for HD decode.

    It might seem that using a pre-existing slice-level API is the obvious approach, but that means a lot of complex decode functionality would need to be implemented in the driver in software since the first implementations are likely to focus on what can readily be done with shaders and that implies the line between CPU and GPU be lower than slice-level.

    Given that, the approach that seems to make the most sense is to hook into an existing open source decode library and add hooks to either use an MC-level decode API or to add the shader code directly to the library using an API like Gallium3D. I haven't looked at the existing decode libraries to see how hard it would be to hook in an MC-level decode API (eg VA-API) but if that did turn out to be relatively clean (ie if the VA-API interface mapped cleanly onto the code in the decode library) then it might be feasible to implement something without waiting for Gallium3D.

    The "most likely to happen" approach is implementing decode acceleration over Gallium3D, since that provides a relatively vendor-independent low level interface for using the 3D engine. Once the "classic mesa" implementation for 6xx/7xx 3D is stabilized I think you will see focus shift almost immediately to porting that code across to a Gallium3D driver. This approach (implementing Gallium3D first then building decode acceleration on top) is what most of the community developers seem to be favoring today.

    HW information to implement shader-based decode acceleration has been available for ~9 months on 6xx/7xx and ~18 months for earlier GPUs, so it's probably fair to say this is not a top priority for other users interested in becoming developers. In the meantime, if you have a multicore CPU there are multithreaded implementations of the current decode stack available and they seem to help a lot.

  6. #6
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,425

    Default

    Quote Originally Posted by myxal View Post
    Are you saying we might see UVD, i.e. bitstream acceleration in opensource drivers?
    I'm saying "we don't know yet, so assume the answer is no unless/until you hear otherwise". In the meantime, decode acceleration with shaders is moving ahead. Even if we opened up UVD tomorrow we would need shader-based decode acceleration anyways, since only the more recent GPUs (everything after the original HD2900) include the UVD block.

    Quote Originally Posted by myxal View Post
    I recall there being some limitations on XvMC. Going straight to what I care about and need the stack to provide (note: according to reports on the web, VDPAU with nvidia does this): Postprocessing of the decoded video frames, needed to support current mplayer's implementation of subtitles, OSD, etc. Does XvMC even allow this?
    I'm pretty sure that all of those features existed before VDPAU came along, and that code exists to implement them using existing APIs such as OpenGL.

    XvMC has all kinds of limitations including being designed around MPEG-2 standards -- the reason for doing XvMC first is simply because a lot of the code is already there. This allows the developers to concentrate on getting a Gallium3D driver working to complete the stack. Once XvMC-over-Gallium3D is running the GPU-specific work will be largely done, and support for other APIs and video standards will be much easier to add.

    Quote Originally Posted by myxal View Post
    The fad now is mobility - how does the power draw compare when using UVD and when using shaders? Well the library is a wrapper for various implementations and we already know nvidia's implementation (mostly) works. We're just THAT eager to see other implementations, working with hardware unaffected by Bumpgate
    The quick answer is "we'll know for sure when the code is written", but I expect shader-based decode will use more power and CPU than UVD-based decode. The important question is whether it will use enough extra power to really matter for most users, and I suspect the answer is "no".

    Quote Originally Posted by m4rgin4l View Post
    You make a good point here. We shouldn't spend more than 50 bucks if all you want is to watch HD content. I think the problem is with people that spent 150 or more and want to get the most out of their hardware.
    I have been recommending something a bit more powerful than the very low end products to make sure the GPU has enough shader power for decode acceleration, ie going for something like an HD2600/HD3650 just to be safe -- at least until the shader-based decode stack is running well for most users.

    The rv710 has 2X the shader power of the rv610/620 so that advice may no longer be relevent.
    Last edited by bridgman; 09-18-2009 at 03:59 PM.

  7. #7
    Join Date
    May 2009
    Posts
    49

    Default

    Quote Originally Posted by bridgman View Post
    I have been recommending something a bit more powerful than the very low end products to make sure the GPU has enough shader power for decode acceleration, ie going for something like an HD2600/HD3650 just to be safe -- at least until the shader-based decode stack is running well for most users.
    El-super-cheapo HD3650 anyone? http://www.newegg.com/Product/Produc...lor-_-14131084

  8. #8
    Join Date
    Aug 2008
    Location
    Finland
    Posts
    1,578

    Default

    Quote Originally Posted by bridgman View Post
    The rv710 has 2X the shader power of the rv610/620 so that advice may no longer be relevent.
    Out of interest: how does rv670 compare with this?

  9. #9
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,425

    Default

    rv610/620 - 40 ALUs (2 SIMDs x 4 pixels/vertices per SIMD x 5)
    rv710 - 80 ALUs (2 SIMDs x 8 pixels/vertices per SIMD x 5)
    rv630/635 - 120 ALUs (3 SIMDs x 8 pixels/vertices per SIMD x 5)
    rv670 - 320 ALUs (4 SIMDs x 16 pixels/vertices per SIMD x 5)
    rv730 - 320 ALUs (8 SIMDs x 8 pixels/vertices per SIMD x 5)
    rv740 - 640 ALUs (8 SIMDs x 16 pixels/vertices per SIMD x 5)
    rv770 - 800 ALUs (10 SIMDs x 16 pixels/vertices per SIMD x 5)

    No problem
    Last edited by bridgman; 09-18-2009 at 03:55 PM.

  10. #10
    Join Date
    Oct 2007
    Posts
    1,267

    Default

    Quote Originally Posted by lbcoder View Post
    If I was going to spend ~$50 and get an ATI card with a tiny, whiny, 40mm fan, I would move up to the RV730 class: RadeonHD 4650 - http://www.newegg.com/Product/Produc...82E16814102829

    Personally, I opted for an RV710/4550 with passive cooling because I'm a neurotic lover of silent computing (and not much of a gamer).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •