Page 5 of 18 FirstFirst ... 3456715 ... LastLast
Results 41 to 50 of 177

Thread: Is Assembly Still Relevant To Most Linux Software?

  1. #41
    Join Date
    Oct 2012
    Location
    Cologne, Germany
    Posts
    303

    Cool More improvements

    Quote Originally Posted by Obscene_CNN View Post
    You can do several things in assembler faster than you can in C for one main reason, you are not constrained by C's rules. For example in x86 assembler you can determine which of 4 values you have with only one compare. On just about all processor architectures you can return more than one value without using a pointer. In assembly you can also return a result without using a single register or memory by using condition flags. Try doing any of these with C.

    You don't see assembly much in programs today. In CS (today) they place an emphasis on one single operation per function. With everything split up into tiny functions you can't truly leverage assembly's true power. Also with everything split up into tiny functions what gains you do get are overshadowed by function calling overhead. One reason why people don't bother to optimize their code today is the bottle neck in performance is this coding philosophy of doing a single operation per function. In other words if if a program spends less than 1% of its time in any function the most gain you can get by optimizing a function is less than 1%.

    This problem is so bad that people don't even try to write fast tight code even in video drivers.

    example of video driver code (changed to protect the identity of the guilty)

    Code:
    static int somechip_interp_flat(struct somechip_shader_ctx *ctx, int input)
    {
            int i, r;
            struct some_gpu_bytecode_alu alu;
    
            for (i = 0; i < 4; i++) {
                    memset(&alu, 0, sizeof(struct some_gpu_bytecode_alu));
    
                    alu.inst = SOME_ALU_INSTRUCTION_INTERP_LOAD_P0;
    
                    alu.dst.sel = ctx->shader->input[input].gpr;
                    alu.dst.write = 1;
    
                    alu.dst.chan = i;
    
                    alu.src[0].sel = SOME_ALU_SRC_PARAM_BASE + ctx->shader->input[input].lds_pos;
                    alu.src[0].chan = i;
    
                    if (i == 3)
                            alu.last = 1;
                    r = some_alu_bytecode_add_alu(ctx->bc, &alu);
                    if (r)
                            return r;
            }
            return 0;
    }
    How it should be written for performance

    Code:
    static int somechip_interp_flat(struct somechip_shader_ctx *ctx, int input)
    {
            int i, r;
            struct some_gpu_bytecode_alu alu;
    
            memset(&alu, 0, sizeof(struct some_gpu_bytecode_alu));
    
            alu.inst = SOME_ALU_INSTRUCTION_INTERP_LOAD_P0;
    
            alu.dst.sel = ctx->shader->input[input].gpr;
            alu.dst.write = 1;
    
            alu.src[0].sel = SOME_ALU_SRC_PARAM_BASE + ctx->shader->input[input].lds_pos;
    
    
            for (i = 0; i < 4; i++) {
    
                    alu.dst.chan = i;
    
                    alu.src[0].chan = i;
    
                    if (i == 3)
                            alu.last = 1;
                    r = some_alu_bytecode_add_alu(ctx->bc, &alu);
                    if (unlikely(r))
                            break;
            }
            return r; 
    }
    One might argue that its the shader code generated that is the important part. However slow CPU code and unneeded memory writes do delay the issue of the shader code to the gpu.
    I am itching to give my 2 cents here, as there is even more room for improvements, despite being quite minor compared to your initial proposal (Note: The CODE-Tag can be handy):

    Code:
    static (u)int_fastQ_t somechip_interp_flat(struct somechip_shader_ctx *ctx, int input)
    {
         (u)int_fastQ_t r; // I don't know the range of r, 
                         // but it could be determined in the case 
                         // and limited to 16 bits or even 8
    
         struct some_gpu_bytecode_alu alu;
    
         memset(&alu, 0, sizeof(struct some_gpu_bytecode_alu));
    
         alu.inst = SOME_ALU_INSTRUCTION_INTERP_LOAD_P0;
    
         alu.dst.sel = ctx->shader->input[input].gpr;
         alu.dst.write = 1;
    
         alu.src[0].sel = SOME_ALU_SRC_PARAM_BASE + ctx->shader->input[input].lds_pos;
    
         for (uint_fast8_t i = 0; i < 4; i++) {
              alu.dst.chan = i;
              alu.src[0].chan = i;
    
              if (i == 3)
                   alu.last = 1;
              r = some_alu_bytecode_add_alu(ctx->bc, &alu);
              if (unlikely(r))
                   break;
         }
         return r; 
    }
    Sadly, the order of the for-loop is determined. Looking at "int input", but not knowing its data-range, one might also be able to reduce its size accordingly.

    More improvements would actually require to know more about the specific project (I would be happy if you could PM me the name of the project; I really would like to know (an hope it's not Intel)).

    Did you commit your changes?

    Best regards

    FRIGN
    Last edited by frign; 04-02-2013 at 07:33 PM.

  2. #42
    Join Date
    May 2009
    Location
    Richland, WA
    Posts
    131

    Default

    Did you commit your changes?

    Best regards

    FRIGN
    FRIGN

    I'm working on a patch but it needs more testing and verification before I release it.

    Thanks for the tip.

    Obscene_CNN

  3. #43
    Join Date
    Apr 2013
    Posts
    1

    Default

    Quote Originally Posted by lesterchester View Post
    (and all crackers use Linux because they would have learned that Linux is the safest)...

    BSDs are so insecurity that you probability only need to write your code in C, C++ or even shell to be successful.
    BSDs are insecure? Really? Surely this is flamebait. If not, please explain to me how BSD is insecure, and if I'm correct in implying that you meant Linux is more secure than BSD, please tell me why that's the case.

    EDITED TO ADD RESPONSE RELEVANT TO ARTICLE:
    I don't think it's a secret that few Linux software and drivers are developed in straight assembly. Isn't this why C was created years ago? To replace using assembly for systems programming? And, yes, properly coded assembly should run faster than any code written in any high level language, but portability tends to be a higher priority goal than speed; readability and ease of maintenance are also usually higher priorities.
    Last edited by kkaos; 04-02-2013 at 11:37 PM.

  4. #44
    Join Date
    Sep 2012
    Posts
    566

    Default

    Quote Originally Posted by Obscene_CNN View Post
    Your end user may differ in opinion. Especially when they are the ones that have to wait for it and pay for the hardware to store and run it. Take a look at Microsoft's Surface where half of the flash was eaten up by the base software.

    Also a common thing over looked which is more and more important today is power consumption. Memory reads/writes and instruction cycles take energy and the more you have to do to accomplish a task the shorter your battery lasts. Power management is a race to get to sleep.
    Are you suggesting that they should have written Windows 8 in assembly?
    Really?
    That's ridiculous. With a project of that size (actually, any size above a couple of functions), there are no guaranties that the code would be smaller or faster, while it would be for sure a million times buggier, and more importantly, released in 2100, which end user might not like (and by that time, the 16GB of memory won't matter much).

    ASM is ok for fine tuning a couple of bottleneck (emphasis) computation steps. It's not ok in any other situation.

  5. #45
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    4,718

    Default

    @frign

    It's eerily similar to Radeon code. Bears a striking resemblance, I could even say.

  6. #46
    Join Date
    Oct 2012
    Location
    Cologne, Germany
    Posts
    303

    Talking Nice README

    Quote Originally Posted by curaga View Post
    @frign

    It's eerily similar to Radeon code. Bears a striking resemblance, I could even say.
    The README of the Radeon-driver states "Abandon hope all ye who enter here".

  7. #47
    Join Date
    May 2009
    Location
    Richland, WA
    Posts
    131

    Default

    Quote Originally Posted by erendorn View Post
    Are you suggesting that they should have written Windows 8 in assembly?
    Really?
    That's ridiculous. With a project of that size (actually, any size above a couple of functions), there are no guaranties that the code would be smaller or faster, while it would be for sure a million times buggier, and more importantly, released in 2100, which end user might not like (and by that time, the 16GB of memory won't matter much).

    ASM is ok for fine tuning a couple of bottleneck (emphasis) computation steps. It's not ok in any other situation.
    I didn't suggest that windows 8 be written in assembly, just abandon the crappy of philosophy ease of development far out weighs all other considerations. Hell if they would just abandon the use of C++ templates it would cut the size to about 1/4th the size it is now.

    And to the contrary large projects are quite possible with assembly in a timely manor with very few bugs.

    http://en.wikipedia.org/wiki/Roller_Coaster_Tycoon

  8. #48
    Join Date
    Dec 2009
    Location
    Greece
    Posts
    351

    Default

    Quote Originally Posted by Obscene_CNN View Post
    I didn't suggest that windows 8 be written in assembly, just abandon the crappy of philosophy ease of development far out weighs all other considerations. Hell if they would just abandon the use of C++ templates it would cut the size to about 1/4th the size it is now.

    And to the contrary large projects are quite possible with assembly in a timely manor with very few bugs.

    http://en.wikipedia.org/wiki/Roller_Coaster_Tycoon
    This made my day...

    BTW, since when were Roller Coaster Tycoon written in assembly? Any proof?

  9. #49
    Join Date
    Sep 2008
    Location
    Vilnius, Lithuania
    Posts
    2,393

    Default

    Quote Originally Posted by TemplarGR View Post
    This made my day...
    A Timely Manor indeed.

  10. #50
    Join Date
    May 2012
    Posts
    333

    Default

    Quote Originally Posted by TemplarGR View Post
    This made my day...

    BTW, since when were Roller Coaster Tycoon written in assembly? Any proof?
    "they" were not
    the first one was

    also fasm is written entirely in fasm
    it is cross platform and very fast and small


    big projects can be made in assembly if planed good

    with C++ you can make a working program way faster then in FASM
    then you can spend years debugging it

    and no, debugging assembly is not that hard as everybody that don't write assembly say
    in some cases it is, but in most it is easier then debugging C (idk C++ good enough to talk about debugging it, all i know theres at least 5 ways to write one thing)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •