It depends on the a lot of factors. Basically, the way shadowfb works is that the master copy of the pixmaps live in system ram. They are stored in system ram and operated on by the CPU in system ram. Then periodically, the front buffer is updated with the results of the all the operations in system ram. It's a one way operation. The CPU copy is always written to the front buffer in vram. When the GPU renders to a buffer in vram (say Xv or OpenGL), the shadow copy no longer has the master copy. Say you want to alpha blend a 3D window with the desktop and some other windows, now you have to get the GPU rendered image into system ram so that the CPU can blend it with the other windows. But wait, can't you have the GPU do the alpha blend? Sure, but then you have to migrate all the CPU buffers that you want to blend into vram so the GPU can access them, so you still have to copy a lot of data around. Wait, can I store everything in gart and get the best of both worlds? Sure, but the problem there is that gart pages have to be pinned so the kernel doesn't swap them out while the GPU is accessing them. Since the kernel can't swap the pages, this limits the amount of memory available to the rest of the system. I think the kernel limits the amount of pinned memory to avoid a driver DOSing the system. With graphics buffers can be huge.
It's easier with a compositor. And you could even support the non-composited case by emulating a composited environment in the driver. In that case you could store the backing pixmaps wherever it makes the most sense (vram or system ram) and then use the GPU to composite the final image. For CPU rendered buffers you'd still need to migrate them to pinned memory for the GPU to access them, but you could keep the shadow front buffer in pinned memory. There are also some tricky corner cases to deal with. It's possible, but it basically comes down to writing a new acceleration architecture which would take time to write and mature. Before going down that road I think it makes sense to see what can be done with an existing acceleration architecture like glamor.
You miss the pointEXA is an internal implementation detail which is not relevant for anyone but DDX writers. The users could not care less if it goes away, more likely they will not even notice.
Having multiple standards is bad. But a healthy competition between different implementations of the same standard is good.
I believe the intention was to do XV as it used to be, an overlay that can't be read back. Sure, no video on your spinning cube, but that wouldn't need any reads back to cpu.
So shadowfb + movies in an overlay that can't be read back or composited. Best of both worlds, accelerated color conversion/scaling, no tearing in movies, etc.
So now that it works with xorg-git I tested it. HD 6550M.
It seems to be mostly fine but some things are excruciatingly slow. Not, the performance is not just good, but really slow. gtkperf needs extremely long for the GtkDrawingArea Line drawing test, causes high CPU usage and slowdowns in X.Code:~ % grep -i glamor /var/log/Xorg.0.log [ 295.584] (II) LoadModule: "glamoregl" [ 295.584] (II) Loading /usr/lib/xorg/modules/libglamoregl.so [ 295.587] (II) Module glamoregl: vendor="X.Org Foundation" [ 295.597] (**) RADEON(0): Option "AccelMethod" "glamor" [ 295.597] (II) Loading sub module "glamoregl" [ 295.597] (II) LoadModule: "glamoregl" [ 295.597] (II) Loading /usr/lib/xorg/modules/libglamoregl.so [ 295.597] (II) Module glamoregl: vendor="X.Org Foundation" [ 295.597] (II) glamor: OpenGL accelerated X.org driver based. [ 295.608] (II) glamor: EGL version 1.4 (DRI2): [ 295.633] (II) RADEON(0): glamor detected, initialising EGL layer. [ 296.112] (II) RADEON(0): Use GLAMOR acceleration.
I also tried ltris as a simple "real world" usage in a game and it also causes high cpu usage in X and the little animations are very slow.
Even when completely done in software: 224 seconds for the line drawing? Is there something wrong or is this just the way it is for now?Code:GtkEntry - time: 0,05 GtkComboBox - time: 1,37 GtkComboBoxEntry - time: 1,03 GtkSpinButton - time: 0,10 GtkProgressBar - time: 0,07 GtkToggleButton - time: 0,10 GtkCheckButton - time: 0,07 GtkRadioButton - time: 0,09 GtkTextView - Add text - time: 0,20 GtkTextView - Scroll - time: 0,01 GtkDrawingArea - Lines - time: 224,47 GtkDrawingArea - Circles - time: 26,41 GtkDrawingArea - Text - time: 0,80 GtkDrawingArea - Pixbufs - time: 0,59 --- Total time: 255,36
edit: exa for comparison
Code:GtkEntry - time: 0,04 GtkComboBox - time: 1,24 GtkComboBoxEntry - time: 0,90 GtkSpinButton - time: 0,11 GtkProgressBar - time: 0,07 GtkToggleButton - time: 0,11 GtkCheckButton - time: 0,05 GtkRadioButton - time: 0,09 GtkTextView - Add text - time: 0,19 GtkTextView - Scroll - time: 0,00 GtkDrawingArea - Lines - time: 1,41 GtkDrawingArea - Circles - time: 3,74 GtkDrawingArea - Text - time: 0,92 GtkDrawingArea - Pixbufs - time: 0,16 --- Total time: 9,05
Last edited by ChrisXY; 08-08-2012 at 07:34 AM.
Yep, unfortunately every option (including starting a new DDX architecture or continuing the existing EXA-based DDX) have real, well known challenges. That's what makes picking one so much fun
I think it's fair to say that any of the options will need work to get to a "happy place" -- the issue was that glamor seemed to offer the best combination of good return on short term work and not too many architectural obstacles for the longer term.
The draw lines haven't been fully optimzied yet, glamor has been focusing on the compositing/rendering part so far.
For example drawing non vertical/horizontal lines fallback to software rendering which is very very slow, so it took
224 seconds at your benchmark. But that is not difficult to accelerate, just need a simple shader to do that.