I'm waiting for AMD to release two or three or four performance patches (?like this?).
Edit: I've switched back to AMD and r600g-git. It works very well for me now, but not too fast 3d.
Last edited by ahlaht; 06-04-2011 at 04:24 PM.
This is an example where the (boring party pooper) harmonic mean should be used.
http://en.wikipedia.org/wiki/Harmonic_mean
The real improvement probably lies more at 1.5x, which in itself is quite fantastic, as it is a 50% improvement.
I guess Intel rather say 14x than 1400% which would have lowered their credibility in being sensationalists.
Just an idea
Sure they could move faster, but that was not my point. Intel hardware is different from AMD's so TTM and EXA might not be a good solution for both of them.
Whenever I code something I try to find at least two ways of doing it and pick the best. If there is no "B" how do I know that "A" is the way to go?
We are already doing that on 6xx and higher. I would love to say that we are doing it because of genius and foresight, but it was mostly because there wasn't any 2D hardware we could use.
If I understand the rest of the description correctly they are also making more use of shadowfb (CPU rendering into system memory) for 2D operations. Our conclusion while implementing 6xx/7xx 2D (the first hardware without 2D acceleration) was that the combination of shadowfb for 2D plus hardware acceleration for 3D would probably work very well, at least for the X drawing API, but I don't think we had time to implement it and see if it really worked. It's not trivial to implement properly, however, since you need the buffer in system memory for fast CPU rendering but need it in video RAM for fast GPU rendering and there are always a few things that really benefit from GPU acceleration. It's definitely easier if you are only dealing with shared-memory devices and not GPUs with local fast VRAM but still not easy (easier != easy).
For what it's worth, I think everyone is reading too much into the "... Architecture" name. This does not appear to be a new interface between common code and driver code or any kind of "going their own way", just driver-internal changes in the implementation of existing APIs.
Someone is probably saying "see, I told you we shouldn't have given it a name..." right now![]()
Last edited by bridgman; 06-05-2011 at 06:32 AM.
The foresight was done by your hardware engineers
Agreed. And picking your next graphics card based on what API (EXA vs UXA, etc...) they use is just plain silly.For what it's worth, I think everyone is reading too much into the "... Architecture" name. This does not appear to be a new interface between common code and driver code or any kind of "going their own way", just driver-internal changes in the implementation of existing APIs.
Thanks for the info
This is true, but picking your next graphics card based on what percentage of its total usable performance is being used by the open source graphics stack is certainly not silly, especially if you're a free software / open source fanatic like me.
I've watched my Radeon HD5970 go from unusable, to slow and usable, back to unusable (regressions in mesa 7.11), and eventually it might go back to usable again -- but the only way to use more than, say, 10% of the total capability of the GPUs on it is to run Catalyst. This Intel patch was targeted specifically at using more of the hardware to get more performance. I haven't seen anything of the sort for AMD in a long, long time (6 months or so). Most of the work coming out of AMD is still just the initial hardware bring-up for a new ASIC. But I don't need to tell you that just this initial hardware bringup, while significant, is only step 1 on a list of 1000 steps to get the driver to professional quality.
My observation from following the commit logs of mesa is that Alex Deucher performs this step 1 quite reliably on new AMD ASICs, but he doesn't do very much in the way of enhancing performance or hardware utilization on existing ASICs. He gets 'em to the point where they can run compiz without constantly crashing, get basic EXA working, maybe basic shader support, then leave the rest to the community a la Dave Airlie and Marek Olsak. Where is the performance work coming out of AMD? Will the new open source driver developers -- once hired -- work on that? Or are we going to chase video acceleration up one side of the mountain and down the other until we even begin to look at bashing 3D apps through the 12 fps ceiling?
If anything, we should see twice as much performance work coming out of AMD than Intel, because AMD has a much steeper hill to climb: high-end AMD discrete GPUs have much more hardware capability available, so I'd imagine it would take a lot more effort to fight pipeline stalls and keep the GPU fed with command streams in order to achieve more than slideshow FPS in any sort of remotely complex scene. The 2D performance is fine, but there's no reason a HD5970 should perform the same (or slower!) with such basic programs as OpenArena, let alone the more complex apps with GLSL and FBOs and floating-point textures in a deferred rendering pipeline.
Hopefully the efforts to remove either Mesa IR or TGSI will reduce some of the CPU overhead and result in a broad-stroke optimization that helps get frames rendered faster. But what else can be done for AMD cards in particular, to use more of the hardware? I haven't even begun to think about multi-GPU rendering (which would be needed to support both GPUs of a HD5970 or HD6990) because that would give me only about a 25 - 35% performance increase after a single GPU is being more or less utilized to its potential. So while I'd like to see both GPUs in my dual-GPU card doing something, I don't think that will happen soon, nor will it really be that important for performance. Let's first get my card up to the point where a single GPU inside it can perform as fast as a HD5850 can perform using Catalyst.
I'm guessing this is a rhetorical question and you know that (a) our open source graphics effort is aimed at supporting the driver development community, not doing all the driver development ourselves, and (b) a lot of the work Alex did in the last 6 months *was* performance-related, but just in case...
Strictly speaking it was targeted at using *less* of the hardware, but I understand what you are sayingThis Intel patch was targeted specifically at using more of the hardware to get more performance.
6 months is a long long time ? You're kidding, right ?I haven't seen anything of the sort for AMD in a long, long time (6 months or so).
Mesa is the wrong place to be looking -- look in the -ati X driver, that's where the work you want is happening. Wasn't the Intel update in the X driver as well ?My observation from following the commit logs of mesa is that Alex Deucher performs this step 1 quite reliably on new AMD ASICs, but he doesn't do very much in the way of enhancing performance or hardware utilization on existing ASICs. <snip> Where is the performance work coming out of AMD?
Not sure what point you are trying to make here. Are you saying that Alex should be doing performance work *instead* of what he was doing, or do you think that in six months he should have enough time to do all the work he did accomplish...
- implement and debug support for Ontario (first Fusion part)
- implement and debug support for Barts/Turks/Caicos,
- implement and debug support for Cayman (significantly different 3D engine)
- implement and debug support for Llano (different display pipe)
- bug fixing and stability improvements on 9 generations of hardware
- performance work you might not have noticed (enabling tiling etc..)
... *and* rewrite the driver stack to deal with some of the major performance bottlenecks ?
They've been hired for a while, they just haven't started yet.Will the new open source driver developers -- once hired -- work on that?
Once Richard's replacement starts I expect we will at least get back to the level of performance work that was being done before he changed teams. Will there be more ? Hard to say.
Note that improving performance doesn't seem to have much to do with "using more of the hardware" but rather optimizing the internal architecture of the driver stack to eliminate hardware operations that are not needed (eg updating less state information), and that is something we would do in conjunction with the community rather than in isolation anyways. A lot of the performance work that Alex did on 3xx-5xx last year was being worked on this year for 6xx and higher, but that may not be obvious in the commits (particularly if you are looking in mesa rather than in the DDX).
I have no idea what this means. Alex isn't working on video acceleration, is he ?Or are we going to chase video acceleration up one side of the mountain and down the other until we even begin to look at bashing 3D apps through the 12 fps ceiling?
If you are asking "are we suddenly going to become stupid ?" I'm pretty sure the answer is "no".
If the generic driver stack is close to being CPU limited even on less powerful hardware that seems like a very good reason for faster hardware not showing the additional performance you expect. Not sure I understand what you are saying here.high-end AMD discrete GPUs have much more hardware capability available, so I'd imagine it would take a lot more effort to fight pipeline stalls and keep the GPU fed with command streams in order to achieve more than slideshow FPS in any sort of remotely complex scene. The 2D performance is fine, but there's no reason a HD5970 should perform the same (or slower!) with such basic programs as OpenArena, let alone the more complex apps with GLSL and FBOs and floating-point textures in a deferred rendering pipeline.
I'm not sure why you keep referring to "using more of the hardware" - do you mean "using the hardware more efficiently" ? If so, then the recent work on things like tiling is the most important step in that direction.But what else can be done for AMD cards in particular, to use more of the hardware?
No problemo, once someone comes up with a good business case for throwing hundreds (or at least dozens) of developers at the code. In the meantime, I think performance work is happening in the right order now :Let's first get my card up to the point where a single GPU inside it can perform as fast as a HD5850 can perform using Catalyst.
- implement support for all the performance-related features - tiling, HyperZ, pageflipping etc.. - and bug fix to the point where they can be enabled by default, so the hardware will be running at full speed at least at a micro-level
- once the above work has been done, *then* start looking at driver architecture options to improve macro-level performance
Do you think something different should be done ? If so, I'm listening.
Last edited by bridgman; 06-05-2011 at 02:53 PM.