Page 92 of 111 FirstFirst ... 42829091929394102 ... LastLast
Results 911 to 920 of 1109

Thread: "Ask ATI" dev thread

  1. #911
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,284

    Default

    Quote Originally Posted by next9 View Post
    There can be found many academic presentations, claiming that Superscalar and VLIW are opposite ways.

    http://www.haenni.info/thesis/presen...tml/sld006.htm
    http://csd.ijs.si/courses/trends/tsld008.htm
    Yeah, that's the problem. Some academic presentations say one thing, others include VLIW in the definition of superscalar :

    http://suif.stanford.edu/papers/isca90.pdf
    http://courses.ece.ubc.ca/476/www200.../Lecture29.pdf

    The second one is particularly interesting, since it distinguishes between "static superscalar" and VLIW, but using those definitions our core falls into the static superscalar bucket because instructions can use the results of the previous instruction.

    I think there is a slight trend towards reserving the "superscalar" term for dynamic extraction of instruction-level parallelism and using "VLIW" for compile-time ILP extraction, but it seems to be pretty recent (ie after the chips were designed). Today you can find both definitions fairly easily.

    Quote Originally Posted by next9 View Post
    The most important thing is, Eric Demers claimed the same thing:

    http://www.rage3d.com/interviews/atichats/undertheihs/

    Thats why I'm asking, because it seems most of the sites just copy and paste the same nonsence.
    If the trend towards defining "superscalar" to exclude VLIW I mentioned above is real I imagine we will shift our usage accordingly (and Eric's comment supports that). In the meantime I think the big question is "which definition of superscalar do you subscribe to ?". If you don't consider VLIW to be a subset of superscalar, then we're VLIW. If you do consider VLIW to be one a subset of superscalar, then we're superscalar via VLIW. I guess I don't understand all the fuss.

    Quote Originally Posted by next9 View Post
    And what about GPGPU? What about scientific applications? Do they have to be compiled with VLIW in mind to run fast on Radeon? Or it is just a problem of driver compiler?
    The compiler usually seems to be able to optimize to the point where the algorithm is running fetch-limited, ie where further ALU optimization would not make a difference. Tweaking for a specific architecture (whether ours or someone elses) usually seems to focus on optimizing memory accesses more than ALU operations.

    There are probably exceptions where tweaking the code to match the ALU architecture can get a speedup but in general it seems that optimizing I/O is what makes the biggest difference on all architectures these days.
    Last edited by bridgman; 10-17-2009 at 11:55 AM.

  2. #912
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,284

    Default

    Quote Originally Posted by codedivine View Post
    Does Radeon 4200 support OpenCL? Does it support compute shaders in CAL? AMD has made big claims about 4200 being Stream-friendly so I am confused. Is it based on RV7xx SIMDs with shared memory and the whole enchilada?
    As Loris said, the HD4200 IGP uses a 3D engine from the RV620 so it has Stream Processors (what we call the unified shaders introduced with r600) and supports the Stream framework (CAL etc..) but does not have all the features from the RV7xx 3D engine. It does not have the per-SIMD LDS, not sure about GDS. I don't believe the OpenCL implementation supports the HD4200, since OpenCL makes heavy use of the shared memory blocks.

    Not sure about DX11 Compute Shaders but I believe they will run on the HD4200 hardware. Be aware that there are different levels of Compute Shader support, however (CS 4.0, 4.1, 5.0 IIRC), and Compute Shader 5.0 requires DX11 hardware (ie HD5xxx).
    Last edited by bridgman; 10-17-2009 at 01:32 PM.

  3. #913
    Join Date
    Oct 2009
    Posts
    122

    Default

    Quote Originally Posted by bridgman
    If you don't consider VLIW to be a subset of superscalar, then we're VLIW. If you do consider VLIW to be one a subset of superscalar, then we're superscalar via VLIW. I guess I don't understand all the fuss.
    Great. Now I understand. I prefer Engineers over marketing guys, and this seemed to me like:

    "Hey. nVidia has a scalar architecure. Lets say to our customers, we have superscalar architetcure" - marketing bullshit.

    nVidia started to use term "stream processor". After that, ATI started to use term "stream processor. But ATI SP and nVidia SP are something different. Higher number of SP seems to be better in marketing material, no matter these are apples to oranges. Thats how it works every day.

    Thats why I ask developer or engineer, instead of asking marketing guy. No matter what definition we use, it is clear how it works.


    The compiler usually seems to be able to optimize to the point where the algorithm is running fetch-limited, ie where further ALU optimization would not make a difference. Tweaking for a specific architecture (whether ours or someone else) usually seems to focus on optimizing memory accesses more than ALU operations.

    There are probably exceptions where tweaking the code to match the ALU architecture can get a speedup but in general it seems that optimizing I/O is what makes the biggest difference on all architectures these days.
    I think, it is clear. Let me ask another question. If VLIW does not mean the problem in GPGPU, what is the reason of lower Radeon performance in typical GPGPU popular application Folding@home? I have seen some graphs, where 9600/9800GT were faster than Radeon HD4890, which does not make a sense to me.

  4. #914
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,284

    Default

    Quote Originally Posted by next9 View Post
    Great. Now I understand. I prefer Engineers over marketing guys, and this seemed to me like:

    "Hey. nVidia has a scalar architecure. Lets say to our customers, we have superscalar architetcure" - marketing bullshit.
    Yeah, I dread the day when someone develops an architecture that can reasonably be described as "superduperscalar".

    For what it's worth, we did talk about the design as "superscalar" inside engineering, it's not just something marketing created. I suspect the tendency to exclude VLIW from the definition of superscalar mostly happened after the unified shader core was designed.

    Quote Originally Posted by next9 View Post
    nVidia started to use term "stream processor". After that, ATI started to use term "stream processor. But ATI SP and nVidia SP are something different. Higher number of SP seems to be better in marketing material, no matter these are apples to oranges. Thats how it works every day.

    Thats why I ask developer or engineer, instead of asking marketing guy. No matter what definition we use, it is clear how it works.
    AFAIK the SPs are relatively similar in terms of what they can do. The tradeoff is partly "a smaller number of SPs at a higher clock speed vs a larger number of SPs at a lower clock speed" and partly "scalar vs superscaler... err... VLIW". Every vendor chooses the approach they think is best, and eventually they converge on something that isn't quite what any of them had in mind at the start.

    Quote Originally Posted by next9 View Post
    I think, it is clear. Let me ask another question. If VLIW does not mean the problem in GPGPU, what is the reason of lower Radeon performance in typical GPGPU popular application Folding@home? I have seen some graphs, where 9600/9800GT were faster than Radeon HD4890, which does not make a sense to me.
    Just going from what I have read, the core issue is that the F@H client is running basically the same code paths on 6xx and 7xx rather than taking advantage of the additional capabilities in 7xx hardware. Rather than rewriting the GPU2 client for 7xx and up I *think* the plan is to focus on OpenCL and the upcoming GPU3 client.

    The current F@H implementation on ATI hardware seems to have to do the force calculations twice rather than being able to store and re-use them -- storing and re-using is feasible on newer ATI GPUs but not on the earlier 6xx parts. BTW it appears that FLOPs for the duplicated calculations are not counted in the stats.

    There also seems to be a big variation in relative performance depending on the size of the protein, with ATI and competing hardware being quite close on large proteins even though we are doing some of the calculations twice. There have been a couple of requests from folding users to push large proteins to ATI users and small proteins to NVidia users, not sure of the status.

    There also seem to be long threads about the way points are measured. Some of the discussions (see link, around page 4) imply that the performance difference on small proteins may be a quirk of the points mechanism rather than an actual difference in throughput, but I have to admit I don't fully understand the argument there :

    http://foldingforum.org/viewtopic.php?f=50&t=8134.
    Last edited by bridgman; 10-17-2009 at 09:15 PM.

  5. #915
    Join Date
    Nov 2007
    Posts
    31

    Default Radeon 58xx vs Fermi

    I saw that 57xx doesn't have double floating point precision support, so 57xx is out of the question for me. Will OpenCL implementation from AMD support double floating point precision emulation using GPU hardware?

    Also, what are the numbers on integer crunching?

    hat about parallel kernel execution support that is announced from nVidia?

    Does AMD support parallel execution of multiple compute kernels?

    Other things I've noticed that are cool about Fermi are ECC support, syscall support, developer configurable caching/manageable memory schemes (for SP local memory).

  6. #916
    Join Date
    Jun 2009
    Location
    Chicago, Illinois
    Posts
    36

    Default

    Quote Originally Posted by Compxpert View Post
    Question:

    You just released the most useless driver update ever...
    How does that make you feel?

    Real/other question: Are you aware of the other MORE important bugs that need fixed like the memory leak?
    Hahaha. That's funny.

    You guys should put in Physx Ageia technology in the driver and OpenGL 3.1 API already. I'm starving for that.

  7. #917
    Join Date
    Oct 2009
    Location
    USA
    Posts
    29

    Default

    Neo,

    Are you the one... just kidding

    but Physx is Nvidia... not ati... and most of the time I actually like what ati does... if they release a driver that works then I'm happy... add a few features here and there and it makes it even better... But no Physx... unless ati can get nvidia to release it... but nvidia hasn't even done that for linux yet(i think)

  8. #918
    Join Date
    Oct 2007
    Posts
    912

    Default

    I think AMD helped out with getting Bullet ready with OpenCL. And OpenGL 3.1 is already supported - not sure on OpenGL 3.2 yet.

  9. #919
    Join Date
    Jun 2008
    Posts
    197

    Default

    I'm running Ubuntu 9.10 64bit right now, and all worked until I installed the ati drivers (classic, huh?)

    The drivers that came with Ubuntu were fine, but I thought I'd update to 9.10 so I installed it and then rebooted and I lost acceleration.

    I discovered a /usr/lib64 directory had been created with all the graphics libs. Considering ubuntu doesnt use /usr/lib64 i knew it had to be moved, so I moved it to /usr/lib and then rebooted and I didnt get X at all this time.

    It complained about not being able to find a amdpcsdb.default file. So i copied amdpcsdb to amdpcsdb.default and then I got X, but now I have a green water mark on the bottom right hand corner of the scren saying:

    "Testing use
    only
    Unsupported
    Hardware"


    I'm using a Vistiontek HD4870

  10. #920

    Default

    I'm running Kubuntu 9.10 with Catalyst 9.10 drivers on my laptop with a radeon hd 3870 (x1).

    Works great, except now my laptop now has a black screen and the only way I can use my laptop is by plugging it into an external monitor.

    More info here:
    http://ubuntuforums.org/showthread.php?t=1315923

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •