Page 2 of 2 FirstFirst 12
Results 11 to 12 of 12

Thread: LLVM Working On Intel AVX-512 Support

  1. #11
    Join Date
    May 2012
    Posts
    4

    Lightbulb

    Quote Originally Posted by HeavensRevenge View Post
    We need OpenMP support WAY more urgently than new vector instructions...
    Neither is more urgent or less urgent than the other. An enthusiast GPU like the GeForce GTX 680 has 8 cores with 6 vector units of 32 elements, so it can run 1536 strands simultaneously. But it obviously suffers from heterogeneous overhead and has a tiny consumer install base. AVX-512 should make it feasible to have mainstream CPUs with 8 cores with 2 vector units of 16 elements, running 256 strands at 3-4 times higher frequency, in the not too distant future.

    AVX-512 is the first x86 instruction set extension that is specifically targeted at the SPMD programming model that is also used by GPUs (e.g. it supports predication through dedicated mask registers). To execute 16 loop iterations in parallel, you need compilers to vectorize your code in the SPMD fashion. That's what LLVM is working on. So you don't want to miss out on that.

    That said, multi-threading is an equally important aspect of maximizing the CPU's performance. Fortunately Intel recently added the TSX extensions to greatly facilitate and optimize thread synchronization.

  2. #12
    Join Date
    May 2012
    Posts
    4

    Lightbulb

    Quote Originally Posted by HeavensRevenge View Post
    We need OpenMP support WAY more urgently than new vector instructions...
    Neither is more urgent or less urgent than the other. An enthusiast GPU like the GeForce GTX 680 has 8 cores with 6 vector units of 32 elements, so it can run 1536 strands simultaneously. But it obviously suffers from heterogeneous overhead and has a tiny consumer install base. AVX-512 should make it feasible to have mainstream CPUs with 8 cores with 2 vector units of 16 elements, running 256 strands at 3-4 times higher frequency, in the not too distant future.

    AVX-512 is the first x86 instruction set extension that is specifically targeted at the SPMD programming model that is also used by GPUs (e.g. it supports predication through dedicated mask registers). To execute 16 loop iterations in parallel, you need compilers to vectorize your code in the SPMD fashion. That's what LLVM is working on. So you don't want to miss out on that.

    That said, multi-threading is an equally important aspect of maximizing the CPU's performance. Fortunately Intel recently added the TSX extensions to greatly facilitate and optimize thread synchronization.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •