Announcement

Collapse
No announcement yet.

Fedora 37 Weighing Change To Improve Profiling/Debugging But With Possible Performance Cost

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fedora 37 Weighing Change To Improve Profiling/Debugging But With Possible Performance Cost

    Phoronix: Fedora 37 Weighing Change To Improve Profiling/Debugging But With Possible Performance Cost

    Fedora developers are weighing adding an option to the default compilation flags for Fedora 37 that can enhance the performance profiling and debug-ability of generated packages but possible performance overhead implications -- possibly a few percent based on prior figures...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    To what extent is this mitigated by function-inlining? Is the frame pointer only updated across true function-call boundaries, or might generated code (from GCC, to be specific) update it more frequently than that?

    Obviously, the loss of RBP will tend to be felt on an ISA with relatively few general purpose registers.

    Overall, I'm in favor of their policy. IMO, the carve-out for packages where a significant impact is observed is pretty key.

    Comment


    • #3
      Datum: As I understand it, Apple already requires this for their OSes on 64-bit ARM.

      (Also, am I the only one who has been seeing a "vBulletin sometimes select and erases all preceding text when you Ctrl+V a URL and the last thing you did was focus the browser window" bug for as long as I can remember?)
      Last edited by ssokolow; 23 June 2022, 04:47 PM.

      Comment


      • #4
        Why not then create a new repository mirror with not only this but all profiling and debugging that makes sense enabled? Then whomever wants to profile something could just transparently replace the same package from one repository to another.

        Comment


        • #5
          Originally posted by SofS View Post
          Why not then create a new repository mirror with not only this but all profiling and debugging that makes sense enabled? Then whomever wants to profile something could just transparently replace the same package from one repository to another.
          So, you're basically saying they should maintain a completely parallel set of binaries, just to make debugging and profiling a little easier?

          For profiling, it makes some degree of sense. However, it's of less help with debugging, because backtraces will often be submitted without someone going through the trouble (or necessarily having the ability, for non-deterministic bugs) of reproducing the problem using the frame pointer build.

          Comment


          • #6
            I really hope they stay away from this for 32 bit x86. It will without question impact performance. For 64bit x86, the register pressure is far less, so there I think we don't necessarily will see that much impact.

            Originally posted by ssokolow View Post
            Datum: As I understand it, Apple already requires this for their OSes on 64-bit ARM.

            (Also, am I the only one who has been seeing a "vBulletin sometimes select and erases all preceding text when you Ctrl+V a URL and the last thing you did was focus the browser window" bug for as long as I can remember?)
            That's not an Apple thing, it's an ARM thingy. It's literally mandated by the AArch64 ABI default function call style. You'll bump into issues if you don't make your compiler and linker aware that you're deviating from the standard function call practice.

            Comment


            • #7
              -fno-omit-frame-pointer is already the default for AArch64, and so on most distributions frame pointers are available. This is from my point of view the most important advantage of AArch64 compared to other architectures, and it's sad that RISC-V didn't also make that decision. With 31 registers available, losing one doesn't cost much for AArch64, and stp/ldp means that the cost of maintaining the frame pointer can be as little as one instruction per function.

              It's so nice having 100% reliable stack traces, even through proprietary libraries. The -fomit-frame-pointer option should never have been invented.

              But it appears that enabling this on x86_64 would not give so accurate backtraces because of not having a link register?
              Last edited by archsway; 24 June 2022, 04:06 AM.

              Comment


              • #8
                I don't see the point of this change. On the one hand, it makes stack unwinding slightly easier (the claimed "high overhead" can be fixed in long-running profiling sessions by just reducing the sampling frequency a bit), but DWARF is needed for symbolization anyway. On the other hand, it makes Fedora 5-10% slower for all use cases - even the large majority that does not need unwinding at all. All of that just because Facebook wants to run a shitty profiler all the time.

                Comment


                • #9
                  Originally posted by archkde View Post
                  I don't see the point of this change. On the one hand, it makes stack unwinding slightly easier (the claimed "high overhead" can be fixed in long-running profiling sessions by just reducing the sampling frequency a bit), but DWARF is needed for symbolization anyway. On the other hand, it makes Fedora 5-10% slower for all use cases - even the large majority that does not need unwinding at all. All of that just because Facebook wants to run a shitty profiler all the time.
                  It does not make stack unwinding slightly easier, it makes it much easier.
                  Without this frame pointer on stack, the unwinder has to make a guess and it could easily be wrong and output lengthy of invalid frames.

                  And where does the 5-10% slower for all use cases come from?

                  Comment


                  • #10
                    Originally posted by NobodyXu View Post

                    It does not make stack unwinding slightly easier, it makes it much easier.
                    Without this frame pointer on stack, the unwinder has to make a guess and it could easily be wrong and output lengthy of invalid frames.

                    And where does the 5-10% slower for all use cases come from?
                    No, the unwinder doesn't have to make a guess, it can unwind via DWARF (unless the program is playing weird games with the stack pointer or corrupting its stack, but in this case, all bets are off even with a frame pointer). And as said, the tool containing the unwinder needs to parse DWARF anyway for symbolization.

                    The 5-10% comes from the cited benchmark and the fact that the cost for the frame pointer needs to be paid always, it's not something you can turn on when running the unwinder and turn off later.

                    Comment

                    Working...
                    X