Results 1 to 9 of 9

Thread: LLVM May Expand Its Use Of The Loop Vectorizer

  1. #1
    Join Date
    Jan 2007
    Posts
    14,787

    Default LLVM May Expand Its Use Of The Loop Vectorizer

    Phoronix: LLVM May Expand Its Use Of The Loop Vectorizer

    LLVM's Loop Vectorizer, which is able to automatically vectorize code loops for performance benefits in many scenarios, may find its use expanded for other optimization levels in future LLVM releases...

    http://www.phoronix.com/vr.php?view=MTM4NDk

  2. #2
    Join Date
    Oct 2007
    Posts
    1,274

    Default

    Wasn't LLVM 3.3 supposed to be released today?

    EDIT: Just looked at their site and saw: "Random wiggle room for further bug fixing and testing" on the schedule.

  3. #3
    Join Date
    Sep 2007
    Posts
    312

    Default

    [...] and benchmarking the loop vectorizer showed it to provide performance benefits for many scenarios. [...]
    @Michael: Did you read your article, which you linked, once more? The loop vectorizer decreased the performance in most scenarios!

    The most common case, however, was actually a performance drop when the LLVM auto loop vectorizer was enabled. As mentioned, there isn't yet any cost-model for LLVM to determine when to vectorize a loop or not, plus other performance tuning of this newly-committed code is still needed.
    This is what you wrote in the linked article.

  4. #4

    Default

    does LLVM have well defined rules for what is enabled at various optimisation levels.

    for GCC it is
    -O1: optimisations that don't massively increase compile time
    -O2: O1 + all optimisation that don't increase binary size
    -O3: O2 plus all safe optimisations even if they bloat binary size (though heuristics should stop it bloating to the point of slowing it down)
    -Os: O2 plus optimisations to make the code smaller
    -Ofast: O3 plus some unsafe math options
    -Og: all optimisations that don't effect debugging
    http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

    so auto vectorising ought to go in at O2, unless it involved adding extra code (alignment checks and fallbacks http://locklessinc.com/articles/vectorize/ ) and assuming that its safe. also there should be heuristics so that its only enabled where it speeds stuff up.

  5. #5
    Join Date
    Nov 2012
    Posts
    164

    Default

    All common cases of automatic vectorization will increase code-size. It will need a version of the original loop for remainders of the vectorization factor, and will often need versions of the vectorized loops for different detected architecturess, and possibly alignment issues.

  6. #6
    Join Date
    Oct 2012
    Location
    Washington State
    Posts
    458

    Default

    Go read the cfe-dev list off of clang.llvm.org. You'll get your answers.

  7. #7

    Default

    Quote Originally Posted by carewolf View Post
    All common cases of automatic vectorization will increase code-size. It will need a version of the original loop for remainders of the vectorization factor, and will often need versions of the vectorized loops for different detected architecturess, and possibly alignment issues.
    i would have expected (though i may be wrong) a few cases, where vectorisation could shrink the binary size, due to condensing several instructions into one. it would need to be a case where the alignment was fixed and the number of iterations was guaranteed to be a multiple of 4 (or 8 or whatever).

  8. #8
    Join Date
    Jun 2010
    Location
    ฿ 16LDJ6Hrd1oN3nCoFL7BypHSEYL84ca1JR
    Posts
    1,052

    Default

    Quote Originally Posted by carewolf View Post
    All common cases of automatic vectorization will increase code-size. It will need a version of the original loop for remainders of the vectorization factor,
    Or you could fill the remaining space up with 0 for addition and 1 for multiplication.

  9. #9
    Join Date
    Mar 2013
    Posts
    47

    Default

    Quote Originally Posted by oleid View Post
    @Michael: Did you read your article, which you linked, once more? The loop vectorizer decreased the performance in most scenarios!
    Did YOU read the article?

    There are two aspects to vectorization (actually to any optimization, but it's most obvious for vectorization):
    - there is generating optimal code (given the ISA, the targeted µarchitecture, the source code) AND
    - there is a cost model (this time very dependent on the targeted µarchitecture) which determines whether to use the vectorized code or not.

    For vectorization the cost model is especially important precisely because it is easy to screw things up and result in slower code --- because there can be a whole lot extra overhead to making the vectorized code work compared to scalar code.

    The first round of LLVM attempts were specifically targeted at generating optimal code --- this was publicly stated, and that's precisely why vectorization was on by default. There have been attempts now at getting the cost model correct, but no-one's promising it's perfect yet.

    In other words, yes, blind vectorization can definitely screw things up. But this is not incompetence, nor does it show that vectorization is a foolish idea. It is a reflection of the specific order in which tasks are being performed.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •