Page 3 of 4 FirstFirst 1234 LastLast
Results 21 to 30 of 31

Thread: Anyone with HD5870 or HD5850 using recent opensource driver and kernel?

  1. #21
    Join Date
    Jun 2012
    Posts
    289

    Default

    Quote Originally Posted by crazycheese View Post
    So, the conclusion so far:
    - VLIW5 is less efficient and more complex than VLIW4.
    ...unless it comes to computations. You see, with proprietary catalyst driver VLIW5 beats VLIW4 on massively parallel computations. In fact it's quite hard to buy HD5xxx cards these days, even used ones. Most of medium and top range cards were bought by those who are doing high-performance computing for fun and profit.

    So as for me it looks like compiler issue rather than anything else. In fact, VLIW4 seems to be lite version of VLIW5. AMD just saved some bucks on making smaller cheaper ICs and selling them as "new", "improved" thingies. Sure, they improved TDP. At cost of computations speed . Yet selling cards of same class under same price. Epic marketing win (for AMD).
    Last edited by 0xBADCODE; 09-17-2012 at 07:51 AM.

  2. #22
    Join Date
    Apr 2010
    Posts
    1,946

    Default

    Quote Originally Posted by 0xBADCODE View Post
    ...unless it comes to computations. You see, with proprietary catalyst driver VLIW5 beats VLIW4 on massively parallel computations. In fact it's quite hard to buy HD5xxx cards these days, even used ones. Most of medium and top range cards were bought by those who are doing high-performance computing for fun and profit.

    So as for me it looks like compiler issue rather than anything else. In fact, VLIW4 seems to be lite version of VLIW5. AMD just saved some bucks on making smaller cheaper ICs and selling them as "new", "improved" thingies. Sure, they improved TDP. At cost of computations speed . Yet selling cards of same class under same price. Epic marketing win (for AMD).
    Can you throw in compute comparsion between similar class chips (5870 vs 6950, for example)?
    Because, according to bridgman, vliw4 has 4 full-featured units and vliw5 1 full, 4 simple ones.

    Also, do you know the state of opensource compute now?

  3. #23
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,386

    Default

    Quote Originally Posted by crazycheese View Post
    - VLIW5 is less efficient and more complex than VLIW4. Pre 6xxx are better recycled.
    It depends on the workload. If you're running the latest DX11 games with the proprietary driver then VLIW4 is probably the way to go, but the mix of games people run on Linux today is probably closer to what VLIW5 was optimized for. That's why I said "more future proof" rather than simply "better". It is easier to optimize an OpenCL stack for VLIW4 than for VLIW5, however, and that was the main reason for changing.

    Note that all of the NI parts except Cayman (HD69xx) are VLIW5, not VLIW4.

    Quote Originally Posted by crazycheese View Post
    - The key why all Radeons (pre-Northern Island GPUs) are so slow with opensource driver - is absence of efficient compiler or whitepaper how to write it; which AMD is not releasing.
    The compiler efficiency affects VLIW5 and VLIW4 equally -- it's more about the ability to pack multiple independent shader instructions into a single VLIW instruction. If/when the compiler picks up that capability, it will have an easier time with VLIW4 because the instruction slots all have the same capabilities.

    Quote Originally Posted by 0xBADCODE View Post
    ...unless it comes to computations. You see, with proprietary catalyst driver VLIW5 beats VLIW4 on massively parallel computations. In fact it's quite hard to buy HD5xxx cards these days, even used ones. Most of medium and top range cards were bought by those who are doing high-performance computing for fun and profit.
    Yeah, that's always been the non-obvious part -- VLIW5 was hands-down the most efficient whenever the workload could be coded or compiled to make full use of the hardware (eg bitcoin generation). The VLIW5 architecture had the cheapest ALUs (because they focused on operations required for pixel and vertex processing) and the lowest sequencer overhead. On the other hand, there has been a gradual increase in workloads which don't map efficiently onto VLIW5, and that's what drove the transition to VLIW4 and then scalar for high end parts.

    For low end parts running mostly traditional graphics workloads, VLIW5 is still probably the most efficient.

  4. #24
    Join Date
    Aug 2012
    Posts
    292

    Default

    Quote Originally Posted by 0xBADCODE View Post
    ...unless it comes to computations. You see, with proprietary catalyst driver VLIW5 beats VLIW4 on massively parallel computations. In fact it's quite hard to buy HD5xxx cards these days, even used ones. Most of medium and top range cards were bought by those who are doing high-performance computing for fun and profit.

    So as for me it looks like compiler issue rather than anything else. In fact, VLIW4 seems to be lite version of VLIW5. AMD just saved some bucks on making smaller cheaper ICs and selling them as "new", "improved" thingies. Sure, they improved TDP. At cost of computations speed . Yet selling cards of same class under same price. Epic marketing win (for AMD).
    thats only true for the closed source catalyst.

    source: https://en.bitcoin.it/wiki/Mining_hardware_comparison
    5870 ->481
    6970 ->433
    But we talk about open source drivers right ? And its only true if the shader fit in the simple shader units.

  5. #25
    Join Date
    Dec 2007
    Posts
    2,328

    Default

    Quote Originally Posted by crazycheese View Post
    So, the conclusion so far:

    - VLIW5 is less efficient and more complex than VLIW4. Pre 6xxx are better recycled.
    It depends heavily on the workload and how well the compiler can optimize. For some tasks VLIW5 can be better than VILW4. VLIW4 is just better suited to non-graphics tasks and is somewhat easier to write a compiler for.


    Quote Originally Posted by crazycheese View Post
    - The key why all Radeons (pre-Northern Island GPUs) are so slow with opensource driver - is absence of efficient compiler or whitepaper how to write it; which AMD is not releasing.
    We've released detailed ISA documentation on all asics since r5xx including information on optimization. Additionally we provide a shader analyzer which is useful for optimizing shader code for AMD GPUs. Unfortunately, writing a good optimized compiler is a complex task regardless of how much documentation is available. Using a proper compiler framework is a good first step and Tom has already started working on support for AMD GPUs using LLVM. There are still being improvements made to the compilers for major CPU families (x86, ARM, etc.) and those are much more common than GPU instruction sets.

  6. #26
    Join Date
    Apr 2010
    Posts
    1,946

    Default

    Thanks, so I guess VLIW5 is not that bad if compiler gets to do its job..

    AMD has jumped to nvidia-like scalar approach back with 7xxx. Why do single-chip GCN cards like 7970 have compute efficiency matching 6990?!

    I thought the VLIW approach with the (precious) compiler eliminating any misses and optimizing loads is more efficient than scalar..? Any idea?

    Quote Originally Posted by agd5f View Post
    We've released detailed ISA documentation on all asics since r5xx including information on optimization. Additionally we provide a shader analyzer which is useful for optimizing shader code for AMD GPUs. Unfortunately, writing a good optimized compiler is a complex task regardless of how much documentation is available. Using a proper compiler framework is a good first step and Tom has already started working on support for AMD GPUs using LLVM. There are still being improvements made to the compilers for major CPU families (x86, ARM, etc.) and those are much more common than GPU instruction sets.
    Thanks Alex! I appreciate the effort a lot. I know the role which compiler plays in VLIW; in fact, it was claimed the VLIW technically whole lot simpler than compiler itself.

    By the way, I have unrelated question about LLVM and GCC - does GCC folk have expressed the idea to fork current monolithic architecture and redesign a compiler using LLVM approach? Or are we fated to LLVM in the future when it comes to VLIW optimization?
    Last edited by crazycheese; 09-17-2012 at 10:16 AM.

  7. #27
    Join Date
    Dec 2007
    Posts
    2,328

    Default

    Quote Originally Posted by crazycheese View Post
    Can you throw in compute comparsion between similar class chips (5870 vs 6950, for example)?
    Because, according to bridgman, vliw4 has 4 full-featured units and vliw5 1 full, 4 simple ones.
    Not exactly. It's more like 4 regular units and 1 special unit. The trans unit on VLIW5 only handles special instructions (FLT_TO_INT, RECIP, etc.), so it's not rally simple and full per se, On VLIW4, the the trans unit went away and the regular units assumed the old trans only instructions,but they still have similar limitations as far as instruction groups are concerned.

    Quote Originally Posted by crazycheese View Post
    Also, do you know the state of opensource compute now?
    Tom is working on a LLVM-based compiler for r600g. It's not yet enabled by default yet for r600g since it's still missing a couple of features and needs more testing on r6xx/7xx and cayman systems.

  8. #28
    Join Date
    Dec 2007
    Posts
    2,328

    Default

    Quote Originally Posted by crazycheese View Post
    By the way, I have unrelated question about LLVM and GCC - does GCC folk have expressed the idea to fork current monolithic architecture and redesign a compiler using LLVM approach? Or are we fated to LLVM in the future when it comes to VLIW optimization?
    I've heard they are moving to a more modular structure, but I don't follow gcc that closely so I'm not sure of the details.

  9. #29
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,386

    Default

    Quote Originally Posted by crazycheese View Post
    Because, according to bridgman, vliw4 has 4 full-featured units and vliw5 1 full, 4 simple ones.
    Necro-lover said that, not me. I said 4 regular ALUs (X, Y, Z, W) plus the specialized T unit.

  10. #30
    Join Date
    Apr 2010
    Posts
    1,946

    Default

    Quote Originally Posted by bridgman View Post
    Necro-lover said that, not me. I said 4 regular ALUs (X, Y, Z, W) plus the specialized T unit.
    Yes, my mistake. Thanks for your time!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •