Strange state of 3D perf for radeon....
This morning just to see I launch 2 games on a intel X3100 laptop and a ATI X1700 (256Mo) laptop and the results surprised me a lot :
Scorched3D : nearly same performance
Hedgewar : X3100 : ~100FPS, X1700 : ~70FPS
The same graphical options were applied.
For info :
X3100 Laptop :
core2duo T5670 (1.8Ghz), 3Go RAM
Gentoo 64bits : KDE 4.3 (KWIN composite active), xorg-server 1.6.3, Kernel 2.6.31-rc6 and libdrm, mesa, xf86-video-intel from git (yesterday) - DRI2 + KMS
X1700 Laptop :
core2duo T7200 (2Ghz), 2Go RAM
Gentoo 64bits : KDE 4.3 (KWIN composite active), Kernel 2.6.31-rc6 and xorg-server, libdrm, mesa, xf86-video-ati from git (yesterday) - DRI1 w/o KMS
I suppose this is not common to all 3D games but if someone has some explication...
Last edited by rem5; 08-18-2009 at 10:59 AM.
The free Radeon Driver Support only OpenGL 1.4 ( Mesa Master is now OGL1.5 vor radeon r200(?)/r300/r400/r500) )
At the moment of my test, 1.5. I know this can make a big difference but from what I hear the major difference between OGL 1.5 et 2.0 is GLSL.
Originally Posted by Nille
so if this game doesn't use GLSL this shouln't be related, right ?
Even if it's the case it's surprising to see an IGP more powerfull than a discrete card (and not a low end at his time).
Last edited by rem5; 08-18-2009 at 11:28 AM.
There are 2 big optimizations that aren't implemented yet in the r300 3D driver: hyperz and texture tiling. Beyond that, it would probably be best to profile the application and see where the slow points are.
talking about optimizations, i wonder, do any of these options and codebases
Originally Posted by agd5f
include ANY (current ?) SIMD (sse3/4/altivec/NEON etc) optimizations and Glibc replacement code?
I have proven http://freevec.org/content/libfreeve...hmarks_updated that glibc, the #1 libc used on Linux, is totally unoptimized even for common platforms (such as x86 and x86_64), and there are performance gains that could/should materialize if someone took the effort to do it....."
Finally, with regard to glibc performance, even if we take into account that some common routines are optimised (like strlen(), memcpy(), memcmp() plus some more), most string functions are NOT optimised. Not only that, glibc only includes reference implementations that perform the operations one-byte-at-a-time! How's that for inefficient? We're not talking about dummy unused joke functions here like memfrob(), but really important string and memory functions that are used pretty much everywhere, like strcmp(), strncmp(), strncpy(), etc.
In times where power consumption has become so much important, I would think that the first thing to do to save power is optimise the software, and what better place to start than the core parts of an operating system? I can't speak for the kernel -though I'm sure it's very optimised actually- but having looked at the glibc code extensively the past years, I can say that it's grossly unoptimised, so much it hurts, Markos"
i assume at least Someone Has actually taken the time and profiled all the current code bases you refered to etc, and tryed to add at least some of these massive SIMD speed improvements were they can already... if Not will you, and when ?
Last edited by popper; 08-18-2009 at 09:45 PM.
I believe the common code in Mesa makes use of vector instructions, but I doubt the hardware drivers do. Right now 3D performance is primarily limited by how efficiently the GPU is used; it's pretty rare to be CPU limited (which is where vector CPU instructions would help).
thanks, i was under the impression Today that infact the GPU's were sat around mostly waiting For the CPU and by extension it's SIMD part, to actually give the GPU something to do.
not the other way around, were the CPU is waiting on the GPU to return something, as you would do in the likes of the GPU/UVD decoded frame editing, and the current AMD buzz word 'OpenCL' etc.
eather way new SIMD/Vector code on and in both CPU and GPU camps seems like a very good thing to consider as you write your new code, and extend and refactor the old, for both new speed increases for free and larger power saving as your current base for the long term future as Markos above claims...
Last edited by popper; 08-18-2009 at 09:40 PM.
Depends on whether you are talking about 3D (as we are here) or video. In the case of video, it's very common for the decoder to be CPU limited unless the entire decode task is dumped onto the GPU, and in those cases vector instructions on the CPU can be a big help -- but AFAIK they are already heavily used in the decoder.
The drivers do render acceleration (Xv) but all the work there is done on the GPU so vector CPU instructions don't really make a difference for the drivers.
If we get to the point where drivers take on the entire decode task, hand portions off to the GPU shaders and do the rest in the driver, only then would vector CPU instructions make a difference. Personally that seems like re-inventing the wheel to me... I would rather take an existing, well understood (and already vectorized ) decode library and add GPU hooks just for the tasks where GPUs can be effective.
GPUs only know how to do vector processing (SIMD on most vendors hardware, SIMD+superscalar on ATI hardware), so there's really nothing to vectorize there.
Last edited by bridgman; 08-18-2009 at 09:42 PM.