The RadeonHD crew yesterday released xf86-video-radeonhd 1.2.0 while the xf86-video-ati developers (mainly Alex Deucher) have been working on a few commits for their competing driver. There have been about a dozen commits to this driver tree in the past 24 hours...
Not much of a change on my r300 cards. Nothing seems broken either though.
My main issue is with the textured video. Trying to play a 352x240 .avi fullscreen (roughly 1600x1200 centered in my 1920x1200 monitor) in vlc with overlay (i.e. XVideo port -1):
vlc: 14% CPU
Xorg: 3.9% CPU
With textured video (i.e. XVideo port 1):
vlc: 40% CPU
Xorg: 40% CPU
The quality of the video with overlay seems better also, its much softer and less pixelated. I.e. textured video seems to be using bilinear and overlay bicubic or something better. What algorithm is the textured video using? Is there more than one choice, and if so, how do I change it?
EDIT: Just a note, I retested with a previous radeon revision, and it completely swamped my CPU, so it seems things are getting slightly better. I'm guessing on r300 maybe the shaders just aren't powerful enough, or is it a driver issue?
I should mention that the only place where radeon and radeonhd could be considered to "compete" (in the sense that they both perform the same function in different ways) is in the modesetting portion.
The work being done now is learning how to make open source acceleration code play nice with the 3D engine on 3xx-5xx parts, and that code will be used with minor changes in both radeon and radeonhd. The radeonhd developers are working on getting DRI support running in radeonhd -- once that happens then radeonhd acceleration code will be able to use drm, and some of the acceleration work will start to be done first in radeonhd rather than starting in radeon and porting to radeonhd later.
Look for a Phoronix article titled "DRI support added to RadeonHD" -- that's when our evil plan will suddenly start to make sense
There is another short-term benefit to doing acceleration work in radeon, however. Acceleration changes made in radeon can be immediately tested on 3xx, 4xx and 5xx while on radeonhd it can only be tested on 5xx until the code was backported to radeon.
Tillin9, overlay hardware will always have lower CPU utilization than textured video. That said, there is probably a fair amount of potential to reduce CPU utilization over time. The main focus right now is getting it running stably and reliably.
Don't think it's a question of shader power (it's really the texture engines doing the work, not the shaders, and the 300 has a fair amount of texture throughput), probably just the driver needing more optimization. Optimization work is starting on the EXA acceleration now but probably won't happen much on textured video for a while.
Filtering is probably bilinear right now but we're hoping to improve that.
Thanks for the response. It's okay that textured video isn't there yet if the developers know this and are working on it. Just, right now the 4x increase in CPU use and decrease in video quality means its not practical to use. I guess I'm just impatient.
Also, care to elaborate on why textured video would always require more CPU? I always assumed that they would use roughly the same amount of CPU, with textured video just using up more GPU capacity.
Tillin9, overlay hardware will always have lower CPU utilization than textured video.
I'm curious as to why. Can the GPU dma data from
host memory to local memory? If so, why not let the
driver prepare locked dma buffers in system ram for 2 video frames, the Xv implementation renders into them, then tells the GPU to DMA one frame to local memory while the driver fills the other space with
another frame. When the GPU has one frame in local memory it can then do colorspace conversion and other
neat tricks using whatever unit is appropriate.
Whether you are using video overlay or textured video you still normally have to get the raw video data into GPU local memory. Once that is done, the overlay block just picks up the video data and does the rest of the processing with dedicated hardware, while for textured video the driver also needs to keep feeding rendering commands into the GPU to do scaling and colour space conversion using the shaders and texture engines, using some more CPU cycles.
That said, the additional CPU overhead doesn't have to be very big.
The flip side, though, is that the main reason for using textured video if you have video processing hardware in the overlay (pre-5xx) is so you can render the video into an offscreen buffer and then have the compositing manager seamlessly combine it into the final display image, and that is going to suck up some extra CPU and GPU cycles anyways.
I'm not sure where the open acceleration code is today in terms of co-operating with a compositing manager but Alex might be able to jump in there.
Right now textured video uses the built in colorspace conversion capabilities of the texture engine. The shaders simply pass the texture data through to the render buffer. Filtering is just bilinear texture filtering. At some point we could write a colorspace conversion and/or scaling routine as a fragment shader and use that rather than the standard texture functionality; in fact r6xx will require it as it doesn't have native colorspace conversion in the texture engine. The other issue is that the overlay has native support for planar YUV formats, while the texture engine has to convert them to packed. Once again, we could write a shader routine to handle planar formats natively, it just needs to be done.
The main advantage of textured video is that is draws the output to the destination buffer so it will work with compositing managers like compiz. Additionally, the output can span monitors (overlay is sourced to only one crtc at a time) and you can have multiple ports active simultaneously (run more than one video at the same time, all using textured video).
Also, to follow up on what John said, this isn't a competition, I'm just trying to get the functionality in place in a timely manner. Once the code is written and optimized it can be ported quite easily. At the moment, we need drm support which only exists in the radeon driver. Also, since the 3D engine is programmed very similarly in r3xx-r5xx, the same code is mostly used for all of those chip families. So doing the work in radeon not only allows the work to happen now rather than later, it benefits users of older chips as well.