Page 1 of 2 12 LastLast
Results 1 to 10 of 19

Thread: Intel's Knights Corner Turns Into The Xeon Phi

Hybrid View

  1. #1
    Join Date
    Jan 2007
    Posts
    14,378

    Default Intel's Knights Corner Turns Into The Xeon Phi

    Phoronix: Intel's Knights Corner Turns Into The Xeon Phi

    For those that didn't hear yet, Intel's getting ready to ship the Larrabee-derived "Knights Corner" co-processors and they will be marketed under the name of Xeon Phi...

    http://www.phoronix.com/vr.php?view=MTEyMjY

  2. #2
    Join Date
    Oct 2009
    Posts
    2,072

    Default

    This would be intel's answer to gpgpu?

    Edit: Roughly equivalent to a Radeon HD 3870 X2, or a middle-of-the-road NI or SI.
    Hell, my laptop has an otherwise useless discrete GPU not much less than this thing...

    Edit 2: 576 gigaflops. That's a "cheap" laptop with dual GPU.

    Edit 3: Holy hell that's a lot more than my desktop has... 56 GFlops (RHD 4290).
    Last edited by droidhacker; 06-19-2012 at 10:13 AM.

  3. #3
    Join Date
    Apr 2010
    Posts
    1,946

    Default

    This is exactly what I was thinking several years ago.
    Central CPU will be managing tasks, where pluggable CPU modules will provide actuall performance.
    Multiply these modules with OpenCL and you have ray-trace real-time graphics.

  4. #4
    Join Date
    Jan 2007
    Posts
    459

    Default

    Quote Originally Posted by crazycheese View Post
    This is exactly what I was thinking several years ago.
    Central CPU will be managing tasks, where pluggable CPU modules will provide actuall performance.
    Multiply these modules with OpenCL and you have ray-trace real-time graphics.
    don't forget your history that already happened Many many years ago originally in the BBC and AMIGA add in many core Transporter boards, its taken all this time to become popular again, but it may finally go somewhere on mass with Intel behind this new OLD UK Transporter concept thats come to the x86 landscape now....
    Last edited by popper; 06-19-2012 at 01:39 PM.

  5. #5
    Join Date
    Jan 2007
    Posts
    459

    Default

    Quote Originally Posted by popper View Post
    don't forget your history that already happened Many many years ago originally in the BBC and AMIGA add in many core Transporter boards, its taken all this time to become popular again, but it may finally go somewhere on mass with Intel behind this new OLD UK Transporter concept thats come to the x86 landscape now....
    that OC should read Transputer

    http://en.wikipedia.org/wiki/Transputer

    http://www.classiccmp.org/transputer/

  6. #6
    Join Date
    Nov 2008
    Location
    Madison, WI, USA
    Posts
    862

    Default

    Quote Originally Posted by droidhacker View Post
    This would be intel's answer to gpgpu?

    Edit: Roughly equivalent to a Radeon HD 3870 X2, or a middle-of-the-road NI or SI.
    Hell, my laptop has an otherwise useless discrete GPU not much less than this thing...

    Edit 2: 576 gigaflops. That's a "cheap" laptop with dual GPU.

    Edit 3: Holy hell that's a lot more than my desktop has... 56 GFlops (RHD 4290).
    From what I've heard, the Larabee/KC/Phi design should scale better with branchy code, which is something that GPUs really suck at.

    For certain problem sets GPUs are really good and you can get something approximating its maximum stated performance, but the moment you start adding branches that start sending threads in different directions your performance takes a nosedive.

  7. #7
    Join Date
    Oct 2009
    Posts
    2,072

    Default

    Quote Originally Posted by Veerappan View Post
    From what I've heard, the Larabee/KC/Phi design should scale better with branchy code, which is something that GPUs really suck at.

    For certain problem sets GPUs are really good and you can get something approximating its maximum stated performance, but the moment you start adding branches that start sending threads in different directions your performance takes a nosedive.
    So the trick is.... to write good code. Plus, high end GPU's (which certainly cost a lot less than these intel boards...) are already up in the 8 TFlop range.
    Last edited by droidhacker; 06-19-2012 at 10:26 AM.

  8. #8
    Join Date
    Nov 2008
    Location
    Madison, WI, USA
    Posts
    862

    Default

    Quote Originally Posted by droidhacker View Post
    So the trick is.... to write good code. Plus, high end GPU's (which certainly cost a lot less than these intel boards...) are already up in the 8 TFlop range.
    But what if you're stuck with a problem set which is inherently branchy? Not every algorithm is perfectly suited to execution on GPUs.

    And yes, I agree that 1TFlop/s is a bit low for an absolute performance number.

    My big questions now are:
    1) What's the power consumption for that 1TFlop. Does it require a multi-slot cooler, or is it a single-slot passive cooler?
    2) How's the latency
    3) How quickly can they scale that performance up?

    Actually, #1 is partially answered in the linked blog post. The image of the card shows a dual-slot cooler with a blower, similar to most mid/high-end graphics cards today.

    The other big advantage of this co-processor that is mentioned in the blog post is compatibility. This card will execute x86 (or maybe x86-64) instructions natively, which means that any multi-threaded program that runs on an Intel CPU is a candidate for running on this card. No porting to OpenCL/CUDA/etc required.

    I'm curious how long it will be until someone gets llvmpipe working on this

  9. #9
    Join Date
    Oct 2009
    Posts
    2,072

    Default

    Quote Originally Posted by Veerappan View Post
    But what if you're stuck with a problem set which is inherently branchy? Not every algorithm is perfectly suited to execution on GPUs.

    And yes, I agree that 1TFlop/s is a bit low for an absolute performance number.

    My big questions now are:
    1) What's the power consumption for that 1TFlop. Does it require a multi-slot cooler, or is it a single-slot passive cooler?
    2) How's the latency
    3) How quickly can they scale that performance up?

    Actually, #1 is partially answered in the linked blog post. The image of the card shows a dual-slot cooler with a blower, similar to most mid/high-end graphics cards today.

    The other big advantage of this co-processor that is mentioned in the blog post is compatibility. This card will execute x86 (or maybe x86-64) instructions natively, which means that any multi-threaded program that runs on an Intel CPU is a candidate for running on this card. No porting to OpenCL/CUDA/etc required.

    I'm curious how long it will be until someone gets llvmpipe working on this
    I doubt that it will be that simple. If its just a multi-core x86, its going to be a tank and isn't going to differ significantly from a multi-core CPU. If its not a multi-core x86, then there must be some kind of VM to run x86 code, which again makes it a tank for running x86 code.

    Now million dollar question: If the thing is actually able to run x86 natively (without VM) and "massively parallel', why are they building it into a co-processor board, and not adding this directly into CPU's?

    Edit: blog post doesn't actually say that its x86. Just that it *can* run x86 code. Well, an ARM chip CAN run x86 code (VM), just not well. I'm guessing that this is just marketing crap. They also use this buzz word "intel architecture". That is not the same as saying Intel x86 architecture. Intel is responsible for various different architectures, some experimental, some downright failures (ia64). In the end, I will state that you can't take advantage of a massively parallel processor without coding FOR that massively parallel processor. This is similar to trying to take advantage of a multi-core CPU with a single-threaded application. It will run, just won't benefit from it.

    I'm skeptical about what this thing will do and how.
    Last edited by droidhacker; 06-19-2012 at 10:58 AM.

  10. #10
    Join Date
    Jan 2007
    Posts
    459

    Default

    Quote Originally Posted by Veerappan View Post
    From what I've heard, the Larabee/KC/Phi design should scale better with branchy code, which is something that GPUs really suck at.

    For certain problem sets GPUs are really good and you can get something approximating its maximum stated performance, but the moment you start adding branches that start sending threads in different directions your performance takes a nosedive.
    Veerappan, will the chemical company you now work for be buying some of these knights corner ( they should have kept that name its more memorable than phi) for you to play with and try ?, perhaps you should ask them to then you can have a go at writing some 512bit AVC? SIMD patches for your beta OpenCL code base.... and post results....

    and if they do, it might be nice to see you also wrote a few assembly and C patches for x264 to see the checkasm results of a large 1080p upscale to 4k/8k rescale and encode on these too if they get you some boards, and/or put one online and give the x264 dev's remote access to write and run hd and super HD tests for it.

    assuming their new many core 512 SIMD is up to scratch for generic use and not just some oddball uni niche and top 500 companies with money to burn for overpriced industrial Co-Processor cards.
    Last edited by popper; 06-19-2012 at 02:19 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •