Page 2 of 4 FirstFirst 1234 LastLast
Results 11 to 20 of 37

Thread: LLVMpipe Still Is Slow At Running OpenGL On The CPU

  1. #11
    Join Date
    Nov 2008
    Location
    Germany
    Posts
    5,411

    Default

    only a 48core Opteron 6000 (155gb/s ramspeed)can beat a GPU (hd5870 160gb/s )

    a normal PC do have 5-15gb/s compared to an hd5870 160gb/s its very slow.

    this benchmark only show us this divergence

  2. #12
    Join Date
    Feb 2009
    Posts
    8

    Default

    Are there any plans to use any parts of LLVMpipe alongside the GPU drivers? For example, to move calculations to the cpu if the gpu is overloaded, or if there are things a particular cpu does more efficiently than the GPU?

    I guess more importantly, are there any benefits to such an approach?

  3. #13
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,385

    Default

    I believe parts of LLVMpipe are being used already in cases where the GPU does not have vertex shader hardware.

    Dynamic load balancing (shifting driver work between CPU and GPU depending on the relative performance of the two) is a much bigger issue because getting non-sucky performance requires that the CPU have all of its data in system memory while the GPU needs all of its data to be in video memory.

  4. #14
    Join Date
    Aug 2009
    Posts
    2,264

    Default

    Quote Originally Posted by bridgman View Post
    I believe parts of LLVMpipe are being used already in cases where the GPU does not have vertex shader hardware.

    Dynamic load balancing (shifting driver work between CPU and GPU depending on the relative performance of the two) is a much bigger issue because getting non-sucky performance requires that the CPU have all of its data in system memory while the GPU needs all of its data to be in video memory.
    I don't know if this is stupid, but it seems like a back and forth latency problem. Can you keep the memory both in system memory and video memory and have that sync all the time? Maybe them you would not need to do the back and forth in a crucial moment in time?

    *ducks and runs*

  5. #15
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,385

    Default

    Sure, you just turn on the "magic pixie dust" bit in the hardware

    Seriously, dealing with latency between different memory subsystems (with the associated need for change detection and synchronization) is probably the biggest single challenge when implementing highly parallel systems. It doesn't mean that an easy solution does not exist but it hasn't been found yet.

    Read up on "cache snooping", for example.

  6. #16
    Join Date
    Aug 2009
    Posts
    2,264

    Default

    Quote Originally Posted by bridgman View Post
    Sure, you just turn on the "magic pixie dust" bit in the hardware

    Seriously, dealing with latency between different memory subsystems (with the associated need for change detection and synchronization) is probably the biggest single challenge when implementing highly parallel systems. It doesn't mean that an easy solution does not exist but it hasn't been found yet.

    Read up on "cache snooping", for example.
    Oh yeah... that shit...

    OK first you need to have two adress maps which do not match so they both have to know what adress of a copy of what adres matches with a tag.

    Then both the CPU and the GPU must never at the same time write to the same thing at the same time, while you cannot work with each other (great) so one of the two needs to be the dominant desicion maker. That would be the CPU as it can execute a driver for the graphics card.

    So then the CPU will put what's about to be modified in a command buffer (sort of) and then check what the GPU can alter at that time that does not correspond to the CPU's adress tags changes. Then wen both buffers are empty the lock on what can't be done by the GPU that the CPU keeps track of is lifted and then the GPU can continue.

    Either way massive latency hell...

  7. #17
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,385

    Default

    Welcome to driver development

  8. #18
    Join Date
    Aug 2007
    Posts
    437

    Default

    Dynamic load balancing, we already have that, it's called SLI

  9. #19
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,385

    Default

    SLI (and Crossfire) might do dynamic load balancing between GPUs, but this is something different - dynamic load balancing between CPU and GPU.

  10. #20
    Join Date
    Jun 2010
    Posts
    219

    Default

    Quote Originally Posted by bridgman View Post
    SLI (and Crossfire) might do dynamic load balancing between GPUs, but this is something different - dynamic load balancing between CPU and GPU.
    Not worth it unless you have a CPU that is -really- good at graphics rendering. A good example of this is the PS3- even the RSX with a weak fill rate can deliver stunning visuals with the Cell's SPUs working on parts of the graphics load. This is largely thanks to the combination of XDR and DDR3- low latency XDR doesn't have the peak raw bandwidth of DDR3, but can be randomly accessed by the GPU without taking too many memory cycles away from the CPU- allowing communication and load distribution to work really smoothly (if done right, of course).

    x86(_64) on the other hand is not really good at this and chipset hardware was never designed for ultra-low-latency or high-bandwidth communication between the CPU and GPU. The CPU would better be put to work on physics in most 4-8 core scenarios.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •