Page 8 of 18 FirstFirst ... 678910 ... LastLast
Results 71 to 80 of 177

Thread: Is Assembly Still Relevant To Most Linux Software?

  1. #71
    Join Date
    Jul 2009
    Location
    Germany
    Posts
    480

    Default

    Quote Originally Posted by frign View Post
    Thanks for this insufficient example!

    First off, your C-code stinks. Not only can't you judge efficiency by line-numbers, you also wouldn't ever construct a for-loop this way.
    Here's the correction for you and all the others not yet having understood how to efficiently construct loops, by the means of actually _allowing_ the compilers to optimise it properly:

    Code:
    for(i=66; i; --i){
            stuff;
    };
    1. Code Formatting: Never brag with one-liners when you can't read them once they get more complex.
    2. Count down!: (If possible), it will be much easier for the compiler, because it doesn't need to check an unary-condition and can fire off a jump if zero (JZ in x86) where needed. You would never know that if you didn't learn ASM some day
    3. Pre-Decrementing: I hope you know what that is, because there is good reason to do so: The compiler has no way to efficiently place the Post-Decrement in this loop, whereas it is really simple for him to do this with a Pre-Decrement.
    4. Semi-Colons: Quite small point, but it serves the readability to put a semi-colon at the end of a for-loop.
    5. I'm open for additions...
    at least gcc has the same result for both loops (incrementing / decrementing) if you enable optimizations (-O2). It produces a decrement loop with jne.

  2. #72
    Join Date
    Oct 2012
    Location
    Cologne, Germany
    Posts
    308

    Lightbulb Proper C

    Quote Originally Posted by gens View Post
    im open to critic
    but that example wasnt an example of how to do things in C, it was just the simplest one liner that came to mind

    ofc normally i write it structured like

    Code:
    for( i=0; i<66; i++) {
          things
          }
    this looks readable in longer code to me, and i didnt have one teacher to teach me a specific coding stile
    as i dont do it for money i dont care rly

    and it gets compiled like that shorter example in asm
    thing is C was made for humans too, even thou it was made so programers dont have to write assembly
    so C maps well to assembly but also to human logic and compiler knows you just want to make that loop run 66 times

    but what C dosent tell you is how many registers that cpu has
    it comes natural to use as many variables in a loop as needed to reduce calls and double calculations
    but if you use more variables then you have registers then the compiler has to store them to ram and read back from cpu cache when needed

    thats one case where you dont know what your doing cuz you never learned and the compiler didnt tell you
    probably wont help much in performance thou as it gets loaded quite fast from the cache
    Yes, I agree in most points. There is no definite coding style, but I guess efficiency is an ideal everyone should work on.

    In regards to Loop- and In/Decrement-efficiency, there is a great paper about it by the folks from IAR here.

    Even though the bottlenecks of inefficient code might seemingly be negligible, it is a factor to be still considered, because small issues can sum up into bigger ones once you scale up. And if you learn how to do it properly, you won't even be slower doing it the right way!
    Compilers are a great way of dealing with many architectures, so considering the same optimisation-paradigms work under many architectures, proper C doesn't even require you to know much about many architectures, but it can ultimately help you _know_ what is proper C in the first place.

    Best regards

    FRIGN

  3. #73
    Join Date
    Oct 2012
    Location
    Cologne, Germany
    Posts
    308

    Exclamation Use your brain

    Quote Originally Posted by droste View Post
    at least gcc has the same result for both loops (incrementing / decrementing) if you enable optimizations (-O2). It produces a decrement loop with jne.
    In this trivial case this was to be expected, but in case of more complex algorithms this case is not clear. Once it turns out to be more complex, you will have to do the task yourself and must _not_ rely on the compiler-optimisations.

  4. #74
    Join Date
    Nov 2009
    Location
    Madrid, Spain
    Posts
    398

    Default

    Quote Originally Posted by frign View Post
    In this trivial case this was to be expected, but in case of more complex algorithms this case is not clear. Once it turns out to be more complex, you will have to do the task yourself and must _not_ rely on the compiler-optimisations.
    In fact, the initial trivial example was really serving the guy really badly,not because the compiler will write this code, but also do_stuff() method or code doesn't make any warranties, like for example that RCX is preserved. So if the function do_stuff() changes RCX and sets it to 10, his trivial optimization would break the code and never finishes.

    Also, as it would be assembly, his optimization would exclude the possibility that the compiler can inline do_stuff(); method (if is a method).

    If is inlined/or stuff is code, the compiler will do other things, like will see things that don't depend on i variable and they will be moved outside of the loop (Loop-Invariant-Code-Motion optimization). After that it may find that the loop has a formula that can be computed at compile time as it is a constant expression and it will take 0 ms (no CPU type) as the compiler will be able to compute the expression. Yet sometimes, as the value to iterate is bigger than 66, the compiler can take the decision to split the loop in sequences of 4 like this:
    Code:
    for(auto i=0; i<63; )
    {
     stuff(); i++;
     stuff(); i++;
     stuff(); i++;
     stuff(); i++;
    }
    while(i<66)
    {
      stuff();
      i++;
    }
    and maybe the 4 stuff can be auto-vectorized or even is not, it removes 3 branches (which are costly even in out-of-order CPUs).

    What I'm talking is what any compiler at -O3 level does it today (Visual Studio, Clang or GCC) so is a stupid decision to write assembly (as it is shown).
    Last edited by ciplogic; 04-07-2013 at 02:23 PM.

  5. #75
    Join Date
    May 2012
    Posts
    431

    Default

    Quote Originally Posted by frign View Post
    In this trivial case this was to be expected, but in case of more complex algorithms this case is not clear. Once it turns out to be more complex, you will have to do the task yourself and must _not_ rely on the compiler-optimisations.
    i just tested the for loop
    the compiler didnt optimize it, so ye its better to go --
    makes me wonder now why as it should be the same logic
    hmm


    anyway every language has a lot to learn
    in asm its instruction scheduling and things like that
    in C it is to be good to the compiler
    theres hundreds of pages of text for both topics
    i personally find asm ones easier to understand as optimizing asm is based on simple logic, but thats just me

  6. #76
    Join Date
    May 2012
    Posts
    431

    Default

    Quote Originally Posted by ciplogic View Post
    and maybe the 4 stuff can be auto-vectorized or even is not, it removes 3 branches (which are costly even in out-of-order CPUs).

    What I'm talking is what any compiler at -O3 level does it today (Visual Studio, Clang or GCC) so is a stupid decision to write assembly (as it is shown).
    auto vectorizing is natural in asm

    another thing a compiler cant do is what it cant know
    like when you know a loop will only execute 3-7 times
    a compiler dosent know that so it will put out a speed optimized version, one that is bloated for what it does

    also i never said you have to write whole programs in assembly too get performance
    on the contrary i said you are best to write only few tightest loops in assembly

    also can you show me a program more optimized then x264 or glibc ?
    benchmark the pure C version of musl against glibc then you can say for sure how good a compiler is
    musl from what i see is good, optimized, C so perfect for benchmarks

  7. #77
    Join Date
    Oct 2012
    Location
    Cologne, Germany
    Posts
    308

    Cool I agree

    Quote Originally Posted by gens View Post
    i just tested the for loop
    the compiler didnt optimize it, so ye its better to go --
    makes me wonder now why as it should be the same logic
    hmm


    anyway every language has a lot to learn
    in asm its instruction scheduling and things like that
    in C it is to be good to the compiler
    theres hundreds of pages of text for both topics
    i personally find asm ones easier to understand as optimizing asm is based on simple logic, but thats just me
    Agreed, even though I am a beginner when it comes to ASM, judging from the experiences I fetched so far.

  8. #78
    Join Date
    May 2012
    Posts
    431

    Default

    i found out why the compiler did that

    i put the "i" variable in global(or what you call it) instead of main
    quick code for testing purposes, dont start bashing my C again

    with variable not global the compiler optimizes the increments into decrements

  9. #79
    Join Date
    Oct 2012
    Location
    Cologne, Germany
    Posts
    308

    Cool No wonder

    Quote Originally Posted by gens View Post
    i found out why the compiler did that

    i put the "i" variable in global(or what you call it) instead of main
    quick code for testing purposes, dont start bashing my C again

    with variable not global the compiler optimizes the increments into decrements
    It was global? Hell, no wonder it was so bad.
    FYI, setting the volatile flag on global vars will disable this optimisation again.

    Always remember to keep the scope as small as possible and try to completely get rid of global variables or even structs, that's the best practice.

    And again: Stop trying to trick the compiler, start writing good code! (As you don't know how other architectures behave).

  10. #80
    Join Date
    Nov 2009
    Location
    Madrid, Spain
    Posts
    398

    Default

    Quote Originally Posted by gens View Post
    i found out why the compiler did that

    i put the "i" variable in global(or what you call it) instead of main
    quick code for testing purposes, dont start bashing my C again

    with variable not global the compiler optimizes the increments into decrements
    this is a n00b mistake!

    It seems that you understand all the intriguing parts of asm but you don't know how to write a fast C code. Sorry, in your unique case, you should still write assembly, as I've wrote before, you have to help sometimes the compiler, not make it impossible to optimize. What about the case of: stuff that changes RCX, would create an infinite loop?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •