View Full Version : AMD Catalyst vs. X.Org Radeon Driver 2D Performance
phoronix
01-18-2009, 08:10 AM
Phoronix: AMD Catalyst vs. X.Org Radeon Driver 2D Performance
One of the common complaints about the ATI Catalyst Linux driver is slow 2D performance, but is this really the case? Does AMD's binary-only Linux driver have 2D performance issues that could actually make it run slower than the open-source driver developed by the X.Org community through specifications released by AMD? In this article we have run a total of 28 benchmarks looking squarely at the 2D performance between the Catalyst (fglrx) driver and the xf86-video-ati (Radeon) drivers on Ubuntu Linux.
http://www.phoronix.com/vr.php?view=13388
bugmenot
01-18-2009, 09:13 AM
Thanks for the test :)
How does radeonhd perform compared to ati (radeon) and fglx, is it equal to radeon? And where are the bottlenecks in the free driver, why does it perform so bad in some tests?
Radeon/RadeonHD should be nearly the same.
Xserver 1.6 has extremely improved EXA performance, fglrx wouldn't have a chance with that, especially with Composite/RENDER and who knows what will be when the UXA stuff merges to EXA with Xserver 1.7...
Aphax
01-18-2009, 10:18 AM
I gotta say that with my radeon 4850 and fglrx (8.12) I do notice some strange 2D performance characteristics. Windows resizing is (really) fast, but dragging windows around tends to eat up a whole CPU core and lag my entire desktop (any other rendering going on, i.e. my cpu usage grapher, comes to a complete halt). I get the same thing with a radeon 3200 (IGP). Text rendering is so-so, it's fast and usable, but I do tend to have to wait a few seconds for things like 'dmesg' in an xterm.
It's a long way from xf86-video-ati with shadowfb, but there wasn't (isn't?) EXA support yet for r600+ in xf86-video-ati so I haven't really been able to compare real 2D accel.
It's interesting I think that 2D performance isn't much better with my radeon 3200 since it uses system memory which is supposed to be much faster, right? (or at least I thought that's what made shadowfb so fast)
TeoLinuX
01-18-2009, 10:25 AM
I would have liked to see a driver comparison with more recent hardware, let's say R600/700...
Do you think performances would have been different?
Aradreth
01-18-2009, 11:41 AM
I would have liked to see a driver comparison with more recent hardware, let's say R600/700...
Do you think performances would have been different?
Read the article...
On a similar note, just recently the open-source ATI stack began supporting basic 2D acceleration on the R600/700 series.
2D acceleration is implemented fully yet.
paran
01-18-2009, 01:19 PM
Was the tests done with or without a compositing manager?
I don't use a composition manager, and have to use fglrx to get working X. Redrawing of windows when I switch virtual desktop is horribly slow. Running xcompmgr makes if a little bit faster, but then I get weird artifacts when playing movies using mplayer. (Without xcompmgr I "only" get shearing artifacts)
My second machine have an old Matrox Mystique card from 98 or so. In many ways X11 on that one performs much better than my HD3870.
I upgraded from a GeForce 6600 with wich I used nvidias very high quality binary driver. However even though fglrx is horrible I still hope that I made the right decision when buying ATI. In a few month I hope that the free drivers will have reached a usable state.
bulletxt
01-18-2009, 02:13 PM
In a few month I hope that the free drivers will have reached a usable state.
Some prayers may help...
WSmart
01-18-2009, 02:51 PM
Pretty interesting to look at. I think the results were inconclusive. The next step would be to actually look at applications where the differences have meaning and then you could make some value statements from there. I suppose it depends on which applications your running. I give the article an A+ though. It informs and it adds value to the discussion.
What I learned is that AMD Linux does not have 'perfect' 2D support. The obvious question is, how about Nvidia? How about Solaris and Apple? Do the professional class cards do any better, Quadro or FireGL?
Thanks all.
MostAwesomeDude
01-18-2009, 02:53 PM
Can't say I'm really surprised. On one hand, fglrx has massive amounts of code. There's spots in our EXA where we have just said, "this could be accelerated, but not without a lot of spaghetti." fglrx is spaghetti.
On the other hand, fglrx has some known weaknesses. The pixmap test is the classic example, although there were a few others that pleasantly surprised me. Their handling of things when a compositor is enabled also sucks; I would bet that their test numbers would go down significantly if a compositor were enabled, although it's entirely possible that they've improved their compositing since then.]
~ C.
DoDoENT
01-18-2009, 03:17 PM
I would bet that their test numbers would go down significantly if a compositor were enabled, although it's entirely possible that they've improved their compositing since then.]
~ C.
That's exactly what I wanted to ask. If I use fglrx with compiz it's slow, but without compiz it's quite usable. With radeon driver I got a lot better 2D experience and good multi-monitor support (very useful on laptops), but unfortunately I have to use fglrx because of the PowerPlay and at almost two times more FPS in 3D games (with fglrx scorched3D works perfectly, with radeon it's almost unplayable).
My card is Mobility Radeon X1600 (R500).
As soon as radeon driver gets PowerPlay support and better 3D support, I will rather use radeon driver than fglrx.
tmpdir
01-18-2009, 03:49 PM
Interesting article, was a good read. Hoping for a sequel with more reallife applications to see how combinations of these aspects perform.
After reading my share of ati related driver development I can't surpress the feeling this test only shows that the prop. driver is aimed toward 3d accelaration to get the 2d acceleration up-and-running. untill then the 2d performance of catalyst is not wat it must be... just a brainfart from my side :D
PuckPoltergeist
01-18-2009, 03:50 PM
Can't say I'm really surprised. On one hand, fglrx has massive amounts of code. There's spots in our EXA where we have just said, "this could be accelerated, but not without a lot of spaghetti." fglrx is spaghetti.
On the other hand, fglrx has some known weaknesses. The pixmap test is the classic example, although there were a few others that pleasantly surprised me. Their handling of things when a compositor is enabled also sucks; I would bet that their test numbers would go down significantly if a compositor were enabled, although it's entirely possible that they've improved their compositing since then.]
Can't confirm this with 9.2 Beta and a HD3650 (AGP):
under KDE4
wihtout desktop effetct:
GtkPerf 0.40 - Starting testing: Sun Jan 18 21:45:04 2009
GtkDrawingArea - Pixbufs - time: 1,40
---
Total time: 1,41
with desktop effects enabled:
GtkPerf 0.40 - Starting testing: Sun Jan 18 21:45:13 2009
GtkDrawingArea - Pixbufs - time: 0,12
---
Total time: 0,12
WSmart
01-18-2009, 07:30 PM
...or run the Phoronix Test Suite yourself to see how your system stacks up.
What's the Phoronix Test Suite command? Is there a suite name?
Thanks!
Yfrwlf
01-18-2009, 09:48 PM
What's the Phoronix Test Suite command? Is there a suite name?
Thanks!
One way to find out is to look at the properties of the package using the command line or the Synaptic package manager and look at the files it installed. Look for the binaries it put in /usr/bin. Another way is to start typing phoronix and just hit tab twice several times =D
Of course, eventually making a GUI interface and menu icon for the PTS would be really great. Maybe that's coming in version 2? ;)
Michael
01-18-2009, 10:15 PM
phoronix-test-suite batch-benchmark gtkperf jxrendermark renderbench
Should cover it when running version 1.6.
That's exactly what I wanted to ask. If I use fglrx with compiz it's slow, but without compiz it's quite usable. With radeon driver I got a lot better 2D experience and good multi-monitor support (very useful on laptops), but unfortunately I have to use fglrx because of the PowerPlay and at almost two times more FPS in 3D games (with fglrx scorched3D works perfectly, with radeon it's almost unplayable).
My card is Mobility Radeon X1600 (R500).
As soon as radeon driver gets PowerPlay support and better 3D support, I will rather use radeon driver than fglrx.
I am using powerplay with radeon driver on my mobility x1600 for a few months now and it is working great..you need few patches from this brantch I think
http://cgit.freedesktop.org/~agd5f/xf86-video-ati/log/?h=agd-powerplay
oibaf
01-19-2009, 04:32 AM
To reiterate, each driver was left in its stock configuration with no extra xorg.conf options being set or AMDPCSDB options being assigned.
So both are using XXA? It would have be nice to test also EXA, since XXA is incomplete (i.e. no Render accell) on R300 and newer cards - at least on the "free" driver.
RealNC
01-19-2009, 05:11 AM
What? The test was done without EXA for the radeon driver? Is this a joke?
BlackStar
01-19-2009, 06:05 AM
I *think* Radeon uses EXA by default or, at least, Ubuntu enables EXA without user intervention.
mityukov
01-19-2009, 06:33 AM
I *think* Radeon uses EXA by default or, at least, Ubuntu enables EXA without user intervention.
I believe, there is XAA by default (in Ubuntu Intrepid). Unfortunately, I can't remember where I read this, so can't send you a proof link :-/
oibaf
01-19-2009, 07:13 AM
I *think* Radeon uses EXA by default
No
or, at least, Ubuntu enables EXA without user intervention.
Only in development version jaunty, which will be 9.04, while in the test was used 8.10.
Michael
01-19-2009, 08:06 AM
EXA testing was used. Sorry if that wasn't clear in the article.
suokko
01-19-2009, 09:53 AM
It would be intresting to test XAA vs EXA also. I have noticed at least when using some windows applications in wine EXA has horrible 2D performance. EXA takes 100% CPU usage at 2.1GHz with EXA while XAA takes only around 10 % when cpu is running at 796MHz. Xserver (Xorg 1.5.2) takes nearly all cpu time when using EXA.
To me it seems like either wine is too aggresively optimized for XAA or EXA has some hiden performance penalty for the poor drawing code of wine. This is surprising because everything else has visible speed up when I turn EXA on in my system.
I'm using R200 series mobility radeon.
dungeon
01-19-2009, 10:43 AM
Maybe EXA is not good for r200, who knows:rolleyes:. I experience the same performance drop as you had, plus EXA has problems with Mesa also and native apps. Try ppracer and turn on fps counter and you will see on the end of each level 0(zero) fps.
All this is good with XAA.
DoDoENT
01-19-2009, 10:56 AM
I am using powerplay with radeon driver on my mobility x1600 for a few months now and it is working great..you need few patches from this brantch I think
http://cgit.freedesktop.org/~agd5f/xf86-video-ati/log/?h=agd-powerplay
It seems that commits to this branch are quite old (April 2008). Are you sure this isn't implemented in Ubuntu intrepid's default radeon driver? Because if it is, then what do I have to add to my xorg.conf to enable it? With stock configuration I get 30-40 minutes longer battery life with fglrx than with radeon driver.
17 out of 28?? All that time was spent on an article to reach that determination? Who exactly are the intended beneficiaries of this info when most of us can't even get 8.12 installed and working properly under linux? Just have a look at your own forums.
This article would have hit the spot if the majority of us have the option of using either proprietary or open source drivers with equal ease. Until we get there though a much better article would have been a comprehensive guide to getting 8.12 working with a particular kernel for a particular distro. Don't get me wrong I appreciate such articles, but at this point in time the 2D issue is a non-sequitor.
Cheers.
NeoBrain
01-19-2009, 02:15 PM
17 out of 28?? All that time was spent on an article to reach that determination? Who exactly are the intended beneficiaries of this info when most of us can't even get 8.12 installed and working properly under linux? Just have a look at your own forums.
Well, you must take note of the fact, that the people, on whose systems the driver runs just fine, don't rant as much as the people, on whose systems it fails...
Occasionally you'll see some statements like "most stable driver ever" or "never had a problem with it actually", but it just gets overwhelmed by people who write about 5 or more posts about their problem, and thus you get the impression that fglrx doesn't work on most systems.
And honestly, if you've got an X600 with a PCI-AGP bridge (uhm... if these existed for the X600 familiy already, but you get the idea), it's quite probable that these chips aren't tested that well (apart from the fact that it's just too much maintainance work for most vendors).
Another point is that many people first try to generate rpm or deb files, e.g. as they keep the system cleaner... I, for example, never really could get the driver running with this method. On the other hand, since I'm using the automated installer by fglrx, installation works faster and more reliable than otherwise (even livna repos gave me problems at some point).
korpenkraxar
01-19-2009, 06:30 PM
With stock configuration I get 30-40 minutes longer battery life with fglrx than with radeon driver.
Me too running Debian Sid on a Thinkpad and using a X1400 Radeon. According to ThinkWiki over at
http://www.thinkwiki.org/wiki/How_to_make_use_of_Graphics_Chips_Power_Management _features
Xorg's log file should confirm that scaling is indeed enabled once you specify the DynamicClocks option in the Device section. It does not do that in my case and I have absolutely no idea what to do next :-(
Ideas anyone?
highlandsun
01-19-2009, 07:51 PM
Would these gtk test programs show meaningful/comparable numbers when ported to Windows? Is there any way to see if these drivers in Linux are really squeezing out all the performance the hardware has to offer, or if the Windows drivers still have some unexplored tricks to leverage?
Well Linux's 2D stuff has always been 'faster' then Windows per say.
But really nobody cares that much about 2D performance. In Vista when your running your composted desktop you have zero hardware acceleration going into doing 2d rendering.
If that goes to show you how much that matters.
With Linux 2D on composited desktop I think it's more of a matter all the context changes that the drivers have to go through to convert the X Windows 2D driver's world to the Linux DRM/DRI managed world. Like you have to render the item off-screen, then capture the image, then convert the image to something that can be managed by the 3D drivers then copy that image to texture and then render that texture as a image that we call the desktop.
So many steps. At that point it doesn't really matter if you have a very fast CPU or very fast GPU or anything like that. It doesn't really matter that the conversion gets done fast, either. There can be thousands and thousands of cpu cycles wasted for each one of those steps... reading in instructions form main memory, loading them into cache, executing them, sucking in textures from memory, etc etc. etc. Each time you do a context switch your purging out your cache and starting over and wasting just all sorts of cpu/gpu.
I mean RAM may seem fast, especially compared to disk, but your CPU/GPU will burn through thousands of wasted cycles waiting for information to come in from main memory or over that PCIe bus.
The 'correct' way ot manage all of that would be to render the application off screen, and have the output write directly to 3D texture that is mapped to the 3D primitive that is then used as part of your desktop image.
Something like that that can be done in as close to a single operation as possible.
But the current driver model for Linux won't allow something like that. X.org world and Linux-DRM world is just to heavily split. They were never designed to work together very very closely... instead you just assigned a hunk of the screen for X to render to, then assigned a smaller hunk of hte screen for the OpenGL stuff to be rendered to. That's what the 'overlay' provides and it is fast, but it's ugly. That's how it's designed to work.
---------------------------------------
You see the trick with Linux right now is you have 2 entirely different set of drivers sharing the same single video card. You have the 2D Xorg drivers and then the Linux-DRM/DRI-based 3D drivers.
X.org X Server goes down and performs such actions as configuring PCI devices and modesetting outside of the context of the Linux kernel's control.
So you end up with situations were Linux is configuring PCI devices and doing something like that and X comes along and stomps on it and causes your video card to flake out.
---------------------------------------
So I suppose with Intel's UXA framework it will be much more efficient.
Instead of worrying about getting the 2D drivers working better or porting the 2D X drivers to the 3D Linux-DRM world, they just rewrote the drivers from scratch and implemented the EXA API using the Linux-DRM 3D-related core.
That way you end up with compatibility with current applications, but you render everything directly using the 3D engine. So instead of doing EXA in the 2D engine on the card you do it on the '3D' engine.
That will probably actually end up being slower in benchmarks then just doing 2D-only with no composition, but it doesn't really matter because it'll be fast enough and it'll make it much easier to deal with performance issues that matter... such as video playback acceleration, better composited desktops, a more stable/saner 1-driver design, faster 3D performance, etc.
highlandsun
01-19-2009, 11:54 PM
Sigh... Even on my 8MHz 68000 Atari ST, opening new windows would just POP onto the screen. It's amazing the amount of visible lag people are willing to put up with these days, and that's not even talking about the worse culprit of scrolling a browser window...
2D performance affects *every* computer user, 3D performance only affects the minority of computer users who play 3D games. People have really got their priorities messed up.
MostAwesomeDude
01-19-2009, 11:54 PM
Most of EXA is done with 3D engines on r5xx and newer. In addition, UXA is completely useless for non-Intel. (I'd say it's completely useless, period, but it allows Intel stuff to do zero-copy EXA, which is useful for them.)
DRI2 fixes a lot of things. So does KMS. Be on the lookout for those.
Sigh... Even on my 8MHz 68000 Atari ST, opening new windows would just POP onto the screen. It's amazing the amount of visible lag people are willing to put up with these days, and that's not even talking about the worse culprit of scrolling a browser window...
2D performance affects *every* computer user, 3D performance only affects the minority of computer users who play 3D games. People have really got their priorities messed up.
Well you should pay attention to exactly what is causing those applications to not *pop* up on the screen. I doubt the majority of it has to do with anything that has to do with EXA acceleration.
If your using Gnome:
1. add a "System Monitor: system load indicator" applet to your panel.
2. Right click on it and go to preferences
3. Change the 'system monitor width' to something that is useful and change the colors to something that contrasts. Like leave the 'User' and 'Nice' colors blue, but make the 'system' color green and then the IOwait red.
You'll find that the majority of application start up time, including window draw, is going to be dominated by the cpu simply waiting on your system storage to read out it's information.
Linux applications use a lots of little files in your file system. Lots of little configuration files, lots of little system libraries. They also spend a great deal of time polling various directories and files which are usually empty or missing. All of this causes a lot of I/O seek time. The drive spinning around looking at this or that directory.
So it's not even a question on how fast your drive can read out information. It's all just blowing it's time out on seeking.
Windows, for example, has the registry. While the registry sucks for many different reasons the thing is is that it's a sized-optimized, fast database that is stored almost entirely in RAM almost all the time. So when your Windows applications start up they read in Windows system files, which are already loaded into RAM, and then get their configuration files from that fast little database. There is much much less I/O seek time for most applications. This is one of the reasons why things like IE, MS Office, or whatever start up so much quicker in Windows then the open source alternatives do in Linux.
And if you try to optimize your system by getting rid of Gnome and striping it down to 'lighter' desktops that conserve RAM you can actually make the problem worse because when your running Gnome and Gnome-only applications all the dependencies and libraries are already read into RAM from when you logged into your system, thus reducing the amount of load your system goes under when you actually start up individual applications. (and no.. firefox and openoffice.org are not gnome applications and thus have their own set of libs and whatnot that they read from the drives at start up.)
-------------------------------
Then if you get rid of the I/O wait and seek times that Linux desktop applications tend to get penalized for, you still have system performance issues with scheduling that Linux has to deal with.
Going back to what I was saying in my last post about "context changes"... The main memory in your system is very slow. It's much faster then disk, hundreds of times faster, but it's still much slower then your L2 and L1 cache. Each time the system needs to perform a different sort of processing then those cache's need to be flushed and replaced with new information. That's a context switch and it's a big performance penalty, but it's necessary to maintain the illusion that your running a multitasking computer.
So Linux is heavily optimized for performance. Linux's goal is to get processing done as fast as possible. This means that kernel developers are going to be very careful about maintaining the L1/L2 cache's for as long as possible.
So if something is being processed it's usually best to let it get finished before you move onto the next thing.
In a very performance oriented environment you can actually then end up having very lousy user interactivity. That is the amount of time the system takes to respond to user input can be huge and make the system feel very sluggish and be irritating to use.
In a system optimized for 'realtime' performance... that is performance were you have a set/required latency that the system has to conform too, you can then dramatically increase the level of user interactivity... you can optimize the system not to drain your sound card's buffers and avoid audio stuttering and other artifacts.... make X Windows much more responsive, etc etc... you'd actually end up REDUCING overall performance.
that is by increasing the realtime-like aspect of Linux your going to actually end up slowing things down, slightly. That means you'll score lower benchmarks, but you'll have a more responsive system.
------------------------
It's like this:
Say your doing your chores at home. You have two things you need to accomplish:
Doing laundry, sorting clothes, and whatnot, in the basement...
Raking leaves in the backyard.
So the fastest way to get it done would be to concentrate on one task until it's finished then work on the next.
However you have a wife that is angry at you for whatever reason. She goes down stairs and sees that your not doing the laundry so she yells at you to do the laundry. So you run downstairs and start that.
So then she goes up to the kitchen and looks out the back yard and sees that your not finished raking the leaves, so she yells to you about that, and thus you end up running up stairs and start raking leaves...
Then she goes into the basement to get some diet coke and then yells at you about that.
So you see if your highly reactive and you jump quickly from one job to another then while that can make things responsive it will actually take much longer to get anything done.
----------------------------------
Of course the best way is to simply eliminate the jumping around.
If you can figure out how to do the laundry AND the yard using a single set of operations then you'll win. Even if somebody timed you and saw that it took you longer to do the yard or the laundry then it did before.. But you win because you've eliminated the time it takes to run in the house and up and down the stairs.
With 2D drivers vs 3D drivers it's not really even a question of what 'core' they run on. It's the fact that they exist in such different worlds and are not compatible with each other. Same thing with video playback and decoding acceleration. Anything that requires GPU acceleration and drawing things out to the display.
-------------------------------
Or you can just use tricks to make it _seem_ like your doing things faster. And simply not do things faster at all. This is what composition does for your Linux desktop, or any desktop.
--------------------------------
Traditionally speaking (this is a couple years old and maybe it's changed with recent versions of OS X or Vista) if you were to do system benchmarks to compare display performance of Linux vs Windows vs OS X you'd find that generally speaking 2D application performance Linux is fastest, Windows is next, and OS X is slowest.
But if you asked the users they'd say the exact opposite and say that OS X is the fastest and provides the best visual quality, Windows is next, and Linux is last.
This is because with Linux they saw much more redraw time and visual tearing and all sorts of other ugliness. Were as with OS X they would see nothing but solid and pretty looking UI. The relative speed didn't matter so much.
This is because OS X had composited desktop.
Instead of racing to keep up window rendering with the display, like Linux did, it simply ignored that. When people move windows about on the desktop they are not causing the windows to redraw and all that... they are simply moving a single solid image around.. a square with a image of the application painted on it.
And if the person tried to move the window to fast... it simply doesn't go faster. In Linux it would do this sort of half-redraw and hop from one side of the display to the other, maybe leaving pieces of the image ontop of the desktop or on a window underneath the one your moving, but with OS X it simply just lagged slightly. The mouse moves slower in OS X then, the Window moves slower then it does in Linux (in this example), however because the Window stays solid and pretty then nobody notices it.
So in Linux the actual operations of moving windows and redrawing was much much faster... it still loses because you just can't keep up with the visual quality. If your slower and pretty then people will think your better then something that is fast and ugly.
Within reason, of course.
-------------------------------------------
So the challenges for Linux is to:
* Eliminate the 2 driver model... The X.org DDX vs Linux-DRM/Mesa-DRI drivers. Having two different and inconsistent set of drivers that produce two different sets of graphics that need to have all sorts of image conversion sets taking place to work with each other is just stupid.
Even if 2D on DRI is much slower then 2D on X.org DDX then you still can win. Having them unified makes it much easier to do fancy graphics things like animations, vector-based graphics, faster media playback, etc.
* Better 3D application compatibility, better performance for media decoding, more stable and consistent performance. In other words: Simply higher quality drivers.
----------------------------------------
I mean seriously... do you really go out and spend $150+ on a nice video card to make your GTK combo boxes draw 2msec faster?!
Or do you want something that looks nicer and provides better OpenGL and media acceleration?
MrCooper
01-20-2009, 12:57 PM
Xserver 1.6 has extremely improved EXA performance, fglrx wouldn't have a chance with that, especially with Composite/RENDER
EXA text rendering should indeed be several times faster in 1.6 thanks to the glyph cache, but I don't remember there being as significant improvements in other areas. Just trying to manage expectations. :)
Also, I think it's indeed important to keep in mind that the motivation for EXA was compositing, whereas XAA is excessively optimized for non-compositing.
and who knows what will be when the UXA stuff merges to EXA with Xserver 1.7...
I'm not sure something like that will ever happen, but there's no need to wait anyway - the same benefits (storing pixmap contents in buffer objects, avoiding the EXA pixmap migration code overhead) can be had with EXA already if the driver so chooses.
curaga
01-20-2009, 01:07 PM
I mean seriously... do you really go out and spend $150+ on a nice video card to make your GTK combo boxes draw 2msec faster?!That would be just as reasonable as getting a new 150$ graphics card to get 2 fps more. Yet see the hordes of people doing just that :p
highlandsun
01-20-2009, 06:03 PM
You're missing the point. Even a $30 video card ought to be able to paint its screen instantaneously. And I appreciate your taking the time to post such a detailed and lengthy response, but I'll just comment that I ported X11R1 to the Apollo Domain/OS; I know very well what's involved in getting good performance out of X and a display driver on a multi-tasking OS. It's been almost 25 years since then, and the user experience has only gotten slower.
It seems that commits to this branch are quite old (April 2008). Are you sure this isn't implemented in Ubuntu intrepid's default radeon driver? Because if it is, then what do I have to add to my xorg.conf to enable it? With stock configuration I get 30-40 minutes longer battery life with fglrx than with radeon driver.
one moth ago I had to apply them by hand...see Xorg.log if it is enabled
popper
01-21-2009, 05:36 AM
Well you should pay attention to exactly what is causing those applications to not *pop* up on the screen. I doubt the majority of it has to do with anything that has to do with EXA acceleration.
snip to remove the "The text that you have entered is too long (10403 characters). "
Going back to what I was saying in my last post about "context changes"... The main memory in your system is very slow. It's much faster then disk, hundreds of times faster, but it's still much slower then your L2 and L1 cache. Each time the system needs to perform a different sort of processing then those cache's need to be flushed and replaced with new information. That's a context switch and it's a big performance penalty, but it's necessary to maintain the illusion that your running a multitasking computer.
snip
So if something is being processed it's usually best to let it get finished before you move onto the next thing.
In a very performance oriented environment you can actually then end up having very lousy user interactivity. That is the amount of time the system takes to respond to user input can be huge and make the system feel very sluggish and be irritating to use.
In a system optimized for 'realtime' performance... that is performance were you have a set/required latency that the system has to conform too, you can then dramatically increase the level of user interactivity... you can optimize the system not to drain your sound card's buffers and avoid audio stuttering and other artifacts.... make X Windows much more responsive, etc etc... you'd actually end up REDUCING overall performance.
that is by increasing the realtime-like aspect of Linux your going to actually end up slowing things down, slightly. That means you'll score lower benchmarks, but you'll have a more responsive system.
------------------------
It's like this:
Say your doing your chores at home. You have two things you need to accomplish:
Doing laundry, sorting clothes, and whatnot, in the basement...
Raking leaves in the backyard.
So the fastest way to get it done would be to concentrate on one task until it's finished then work on the next.
However you have a wife that is angry at you for whatever reason. She goes down stairs and sees that your not doing the laundry so she yells at you to do the laundry. So you run downstairs and start that.
So then she goes up to the kitchen and looks out the back yard and sees that your not finished raking the leaves, so she yells to you about that, and thus you end up running up stairs and start raking leaves...
Then she goes into the basement to get some diet coke and then yells at you about that.
So you see if your highly reactive and you jump quickly from one job to another then while that can make things responsive it will actually take much longer to get anything done.
----------------------------------
Of course the best way is to simply eliminate the jumping around.
If you can figure out how to do the laundry AND the yard using a single set of operations then you'll win. Even if somebody timed you and saw that it took you longer to do the yard or the laundry then it did before.. But you win because you've eliminated the time it takes to run in the house and up and down the stairs.
With 2D drivers vs 3D drivers it's not really even a question of what 'core' they run on. It's the fact that they exist in such different worlds and are not compatible with each other. Same thing with video playback and decoding acceleration. Anything that requires GPU acceleration and drawing things out to the display.
-------------------------------
Or you can just use tricks to make it _seem_ like your doing things faster. And simply not do things faster at all. This is what composition does for your Linux desktop, or any desktop.
--------------------------------
Traditionally speaking (this is a couple years old and maybe it's changed with recent versions of OS X or Vista) if you were to do system benchmarks to compare display performance of Linux vs Windows vs OS X you'd find that generally speaking 2D application performance Linux is fastest, Windows is next, and OS X is slowest.
But if you asked the users they'd say the exact opposite and say that OS X is the fastest and provides the best visual quality, Windows is next, and Linux is last.
This is because with Linux they saw much more redraw time and visual tearing and all sorts of other ugliness. Were as with OS X they would see nothing but solid and pretty looking UI. The relative speed didn't matter so much.
This is because OS X had composited desktop.
Instead of racing to keep up window rendering with the display, like Linux did, it simply ignored that. When people move windows about on the desktop they are not causing the windows to redraw and all that... they are simply moving a single solid image around.. a square with a image of the application painted on it.
snip
So in Linux the actual operations of moving windows and redrawing was much much faster... it still loses because you just can't keep up with the visual quality. If your slower and pretty then people will think your better then something that is fast and ugly.
Within reason, of course.
-------------------------------------------
So the challenges for Linux is to:
* Eliminate the 2 driver model... The X.org DDX vs Linux-DRM/Mesa-DRI drivers. Having two different and inconsistent set of drivers that produce two different sets of graphics that need to have all sorts of image conversion sets taking place to work with each other is just stupid.
Even if 2D on DRI is much slower then 2D on X.org DDX then you still can win. Having them unified makes it much easier to do fancy graphics things like animations, vector-based graphics, faster media playback, etc.
* Better 3D application compatibility, better performance for media decoding, more stable and consistent performance. In other words: Simply higher quality drivers.
----------------------------------------
I mean seriously... do you really go out and spend $150+ on a nice video card to make your GTK combo boxes draw 2msec faster?!
Or do you want something that looks nicer and provides better OpenGL and media acceleration?
in essence then drag, perhaps you didnt realise it but...,your asking for a return to the AmigaOS way of doing things with its mirokernel end user realtime message passing and Co-Processor handling of the different parts of the data chain....
sounds like a plan, so how do we encurage all the worlds linux/open code devs to get the current mamoth sized linux executables sizes right down to AmigaOS/AROS http://aros.sourceforge.net/ microscopic size levels and re-impliment the most basic keep and reuse a library in memory until a flush cleanup is sent etc and put this old is new again message passing at the core while doing so.
perhaps linux needs to finally take this old bounties AROS concept and create its own central bounties program http://www.power2people.org/projects.html
as it stands now, only commercial yearly GSOC etc seems to bring out partial advances, a real ongoing bounties plan all year round might be the best option at this moment in time....were anyone can contribute, world business and single user alike....
for the ST guy , you want to see fast windows opening, then this AROS on an old 3.1 GHz Athlon64 X2 6000+ is fun to remember how AmigaOS was far better ;)
http://www.youtube.com/watch?v=scY9LXEGCB0
http://vmwaros.org/
Linuxhippy
01-23-2009, 02:33 PM
So I suppose with Intel's UXA framework it will be much more efficient.
Instead of worrying about getting the 2D drivers working better or porting the 2D X drivers to the 3D Linux-DRM world, they just rewrote the drivers from scratch and implemented the EXA API using the Linux-DRM 3D-related core.
Sorry but this is bullshit.
UXA and EXA both are designed to accalerate certain features through the 3D engines.
The only difference is that UXA a modified EXA (not a rewrite) better suited to shared-memory GPUs.
In fact it was often suggested to merge UXA back into EXA again, to not have so much duplicated code.
- Clemens
MrCooper
01-24-2009, 03:52 AM
In fact it was often suggested to merge UXA back into EXA again, to not have so much duplicated code.
There's nothing to merge back really; UXA just deletes the code from the EXA core which would be inactive anyway if the EXA driver allocated pixmap memory itself (which has been possibe since before UXA).
Linuxhippy
01-24-2009, 06:23 AM
There's nothing to merge back really; UXA just deletes the code from the EXA core which would be inactive anyway if the EXA driver allocated pixmap memory itself (which has been possibe since before UXA).
So UXA consists of a lot of EXA code minus the whole dirty region handling and other stuff. What I ment with "merge" is that EXA should be made more flexible, making all the unescessary stuff optional.
I wonder, was the new "xf86-video-radeonhd:r6xx-r7xx-support" branch used for the benchmark? Because only this supports RENDER accaleration, however its rather untested unstable and incomplete.
- Clemens
MrCooper
01-25-2009, 06:09 AM
What I ment with "merge" is that EXA should be made more flexible, making all the unescessary stuff optional.
But it already is. There has been some talk about using function wrappers rather than inline branches, but I haven't seen any patches yet.
There are also academic problems with the EXA driver interface for pixmap memory allocation, but I think it would have been better to fix those than to duplicate the code...
I wonder, was the new "xf86-video-radeonhd:r6xx-r7xx-support" branch used for the benchmark? Because only this supports RENDER accaleration, [...]
The X1800XT mentioned in the article is an R520, which has been well supported for a while.
Linuxhippy
01-25-2009, 06:43 AM
The X1800XT mentioned in the article is an R520, which has been well supported for a while.
Ah right, I mixed that up with the R700 performance comparison.
From what I understand that comparison compares how fast the drivers can up/download as well as pixman performance ;)
- Clemens
ech0s7
01-30-2009, 05:21 AM
if you would see my comparison test, this is the link: http://www.ech0s7.netsons.org/index.php/archives/31
ech0s7
Michael
01-30-2009, 09:05 AM
if you would see my comparison test, this is the link: http://www.ech0s7.netsons.org/index.php/archives/31
ech0s7
Should have used the Phoronix Test Suite (http://www.phoronix-test-suite.com/)! :P
Linuxhippy
01-30-2009, 09:40 AM
I am quite interested how RadeonHD will do once their r660-r700 accaleration branch is ready.
For now both Catalyst as well as RadeonHD are mostly software-only on R600+.
@Michael: Any plans to do a 2D comparison between NVidia / ATI / Intel?
By the way some JXRenderMark tests are not really real-world, so I recommend limiting to: rects, rectscomposition, putcomposition, all the different blits, texturepaint and gradientpaint.
bridgman
01-30-2009, 12:00 PM
2D on the open source drivers will kick ass. That's a technical term :D
Seriously, the only thing we are not sure about yet is copying to an overlapping area with the 3D engine. The old 2D engine was very "sequential" in its processing so you always knew the exact sequence of reads and writes (that really matters when copying with overlapping source and destination areas), but the 3D engine does so much in parallel that right now we are forcing line-at-a-time copies (which are slow) to ensure the scroll operation does not over-write content accidentally.
A number of potential optimizations have already been identified (most of them are pretty obvious), but I don't know if we will be able to match the performance of the old 2D acceleration block in all scenarios.
Copying to an overlapping area is primarily an issue when scrolling without a compositor, so hopefully it will become a non-issue for real-world experience as the use of compositing desktop managers become more common. I don't know how much this particular operation is reflected in benchmarks, however.
Linuxhippy
01-30-2009, 07:00 PM
2D on the open source drivers will kick ass. That's a technical term :D
Yes, seems to become a really cool EXA driver. Hopefully the improvements will find their way into Catalyst.
Seriously, the only thing we are not sure about yet is copying to an overlapping area with the 3D engine.
Whats about allocating a temporary copy-pixmap?
Copying to an overlapping area is primarily an issue when scrolling without a compositor, so hopefull as the use of compositing desktop managers become more common. I don't know how much this particular operation is reflected in benchmarks, however.y it will become a non-issue for real-world experience
Well, window-resize performance is quite bad when an composition manager is enabled (both, xrender or opengl based), thats the main reason I don't use one.
Do you think there is any chance for accalerated gradient support?
Both, gradients as well as trapezoids for geometry are heavily used in modern UI toolkits/themes, but both currently cause fallbacks and migration/copying all over the place.
I guess trapezoids are quite hard to implement, but gradients should be really simple as shaders.
Would be great to see radeonhd implementing features nobody else has, instead of celebrate reaching a state where intel has been two years ago ;)
- Clemens
bridgman
01-31-2009, 01:23 AM
Whats about allocating a temporary copy-pixmap?
I think we'll end up having to do that when the offset between src and dest is very small (eg a few lines). For larger offsets, I *think* it would be faster to copy N lines at a time.
Well, window-resize performance is quite bad when an composition manager is enabled (both, xrender or opengl based), thats the main reason I don't use one.
Yeah, I'm not sure we clearly understand where the bottleneck is on resize. I'm starting to think that it might be the compositing loop redrawing the new window to the front buffer, not the driver actually resizing the window. I find resizing to be un-noticeably fast on my system but I'm running an RV570 and a quad-core CPU; need to re-test on some slower hardware.
Do you think there is any chance for accalerated gradient support? Both, gradients as well as trapezoids for geometry are heavily used in modern UI toolkits/themes, but both currently cause fallbacks and migration/copying all over the place. I guess trapezoids are quite hard to implement, but gradients should be really simple as shaders.
I need to learn more about gradients; right now I don't understand why we even need shaders. It seems like we should be able to draw triangles and let the interpolators do all the work; maybe there's an edge matching problem or something. All I can say right now is that it seems like we should be able to do something, but I haven't talked to the devs about the problems yet.
Trapezoids seem like they should be easy to break down to triangles as well and I know both hardware and software folks have worked very hard to make all the edges match in OpenGL; again, this is spoken with the clarity of ignorance :D
MrCooper
01-31-2009, 09:47 AM
I think we'll end up having to do that when the offset between src and dest is very small (eg a few lines). For larger offsets, I *think* it would be faster to copy N lines at a time.
May depend on the per operation overhead as well.
Yeah, I'm not sure we clearly understand where the bottleneck is on resize. I'm starting to think that it might be the compositing loop redrawing the new window to the front buffer, not the driver actually resizing the window. I find resizing to be un-noticeably fast on my system but I'm running an RV570 and a quad-core CPU; need to re-test on some slower hardware.
IME it also depends a lot on the client; with some clients resizing is super-snappy for me with compiz, with others it's sluggish. I suspect some clients do a lot of unnecessary redraws on resizing.
I need to learn more about gradients; right now I don't understand why we even need shaders. It seems like we should be able to draw triangles and let the interpolators do all the work; maybe there's an edge matching problem or something. All I can say right now is that it seems like we should be able to do something, but I haven't talked to the devs about the problems yet.
Another problem right now is that the EXA core never even calls the driver Composite hooks for gradients (or solid pictures), because so far no driver could handle them.
Trapezoids seem like they should be easy to break down to triangles as well and I know both hardware and software folks have worked very hard to make all the edges match in OpenGL; again, this is spoken with the clarity of ignorance :D
Yeah, it could be that easy if the RENDER rasterization rules matched those of hardware / 3D APIs... alas, unfortunately they don't.
Also, again there are no EXA driver interfaces for trapezoids yet.
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.