Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: LLVMpipe Scaling With Intel's Core i7 Gulftown

  1. #1
    Join Date
    Jan 2007
    Posts
    14,787

    Default LLVMpipe Scaling With Intel's Core i7 Gulftown

    Phoronix: LLVMpipe Scaling With Intel's Core i7 Gulftown

    When finding out that an Intel Core i7 970 "Gulftown" CPU was on the way, which boasts six physical cores plus another six logical cores via Hyper Threading, immediately coming to mind was to try out this latest Intel 32nm processor with the Gallium3D LLVMpipe driver. There's a lot to love about Gallium3D when it comes open-source Linux graphics drivers with the possibilities being presented by the different state trackers (such as native Direct3D 11 support on Linux) and the hardware drivers themselves being more advanced, easier to write, and eventually should be much faster than the classic Mesa drivers for Linux. One of the drivers that has especially been of interest is LLVMpipe, which is an attempt to finally make a useful CPU-based software rasterizer for Linux by leveraging the Low-Level Virtual Machine infrastructure. Here is our introductory article to LLVMpipe and even with a Core i7 "Bloomfield" processor the driver is very demanding, but with Intel's Gulftown the results are somewhat surprising as we experiment with how this CPU-based driver scales up to twelve threads.

    http://www.phoronix.com/vr.php?view=15407

  2. #2
    Join Date
    Sep 2009
    Posts
    20

    Default

    Given that graphics is an "embarrassingly parallel" problem, shouldn't it be possible -- theoretically -- to achieve very nearly linear scaling with the number of CPU cores? I'm not saying it would be easy, or that llvmpipe is flawed if it doesn't -- just asking whether, theoretically, it's within the realm of possibility to achieve.

    Though I guess one complicating factor here is that it's not just the graphics, but also the normal game logic itself which is running on the CPU at the same time. Have you guys considered trying some kind of purely-graphics benchmark to try and isolate that factor?

  3. #3
    Join Date
    Jul 2009
    Location
    Torrington, Ct. USA
    Posts
    166

    Default

    So going by these test results it seems that adding the 6 logical (HT) cores to the physical cores is actually a hindrance to performance at low resolutions and only becomes at all beneficial to performance at high resolutions and only minimally so, at least as far as LLVM Pipe is concerned.

  4. #4

    Default

    Is this a joke? A $1K CPU to use as a soft renderer being able to play games only @800x600.

    I don't understand the meaning of this article. To show that LLVMpipe scales well? But who's gonna use it anyway?

  5. #5
    Join Date
    Aug 2008
    Location
    Finland
    Posts
    1,595

    Default

    Quote Originally Posted by sirdilznik View Post
    So going by these test results it seems that adding the 6 logical (HT) cores to the physical cores is actually a hindrance to performance at low resolutions and only becomes at all beneficial to performance at high resolutions and only minimally so, at least as far as LLVM Pipe is concerned.
    "The performance improvement seen is very application-dependent, however when running two programs that require full attention of the processor it can actually seem like one or both of the programs slows down slightly when Hyper Threading Technology is turned on. "
    http://en.wikipedia.org/wiki/Hyper-threading

  6. #6
    Join Date
    Jan 2008
    Posts
    772

    Default

    Quote Originally Posted by illissius View Post
    Given that graphics is an "embarrassingly parallel" problem, shouldn't it be possible -- theoretically -- to achieve very nearly linear scaling with the number of CPU cores?
    Is it known that current mainstream rendering techniques are embarrassingly parallel? I haven't studied the algorithms to any real detail, but it would surprise me if they are (I'd expect some issues with Z-sort and overlapping fragments, at least). Surely some important parts of it are, but that's different from the whole pipeline scaling ideally.

  7. #7
    Join Date
    Jan 2008
    Location
    South Africa
    Posts
    233

    Default

    In the last year, ATI got nearly double the performance going from 160 to 320 execution cores, so yes, 3D rendering is very definitely embarrassingly parallel.

    With the current accepted rendering algorithms, Z-sort doesn't need to always happen. Only for transparent rendering you need to sort, and then, only that which is in the tile frustum.

  8. #8
    Join Date
    Apr 2010
    Posts
    271

    Default

    Quote Originally Posted by Ex-Cyber View Post
    Is it known that current mainstream rendering techniques are embarrassingly parallel? I haven't studied the algorithms to any real detail, but it would surprise me if they are (I'd expect some issues with Z-sort and overlapping fragments, at least). Surely some important parts of it are, but that's different from the whole pipeline scaling ideally.
    Not to mention that CPUs themselves do not scale linearly either as each core is going to be sharing L2 cache and main memory bandwidth.

  9. #9
    Join Date
    Jan 2008
    Location
    South Africa
    Posts
    233

    Default

    A summary of sort-of typical rendering in 3D (without considering the actual game logic):
    Order notation used.

    1. Determine view frustum - O(1) - Serial
    2. Determine objects in frustum - O(log n) - Somewhat parallel, but not great

    3. Roughly sort opaque objects from front to back - O(log n) - Mostly serial
    4. Emit every object - O(n) - serial
    4. Where surface is split into tiles - almost O(n) parallelization: (reasonable gain here)
    4.2 Throw away if unneeded in tile - cheap, early exit point
    4.2 Emit each part of object - O(n) - serial
    4.2.1 compute render region - O(1) - serial
    4.2.2 for each pixel under region - stupidly parallel (most of gain here)
    4.2.2.1 test if visible - O(1) - cheap, early exit point
    4.2.2.1 render - O(1)

    5 & 6. More-or less the same as 3 & 4, but transparent objects sorted back to forward, Sorting here can be more expensive, and early exit points much less used

    7. For each post processing: - O(n) - serial
    7.1 For each pixel: - stupidly parallel (most of gain here)
    7.1.1 Do something

    Um, I think that is about it?
    Of course, limits such as cache hits, bandwidth, unbalanced workload, etc... all contribute to slow it down.

  10. #10
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,458

    Default

    I think the issue here is that while graphics still has a big chunk of embarrassingly parallel work the individual tasks are extremely small so for real scalability you either need some hardware scheduling (like a GPU has) or you need to design the software renderer from day one around the idea of having a very large number of cores/threads (as was attempted with the Larabee renderer).

    AFAIK the LLVMpipe renderer was designed for "one to a small number" of threads... I'm pretty impressed with how well it scales.

    I'm only looking at the results from 1 core to 6 cores, since the jump from 6 to 12 isn't really bringing more cores onstream just more threads per core.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •