Announcement

Collapse
No announcement yet.

The Speculative Execution Impact For A 4-Core POWER9

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • The Speculative Execution Impact For A 4-Core POWER9

    Phoronix: The Speculative Execution Impact For A 4-Core POWER9

    Last year we looked at the Spectre mitigation cost on POWER9 using the high-end Talos II server while now several kernel releases later and also having the desktop Blackbird system in our lab, here is a look at how the Spectre/Meltdown mitigation impact is for an IBM POWER9 4-core processor running Ubuntu 19.04.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    kernel only seems to be the sanest option by far

    Comment


    • #3
      Am I reading the ImageMagick-6 compilation times correctly (quickest with the default protection, slowest with no protection) ?

      Comment


      • #4
        Originally posted by zerothruster View Post
        Am I reading the ImageMagick-6 compilation times correctly (quickest with the default protection, slowest with no protection) ?
        I noticed that as well. I think you're reading that correctly. Just not sure what would cause that result.

        Comment


        • #5
          I wish AMD and Intel would take notice, why are we still using SMT when SMT4 and SMT8 exist? Imagine if either AMD or Intel released a 4C/16T or 4C/32T processor, everyone would go nuts. SMT can be added to a cpu for relatively cheap from an added circuitry standpoint and tests with 12C/96T cpu vs a 24C/96T cpu show that under some workloads the former is faster than the latter.

          Comment


          • #6
            Originally posted by sophisticles View Post
            I wish AMD and Intel would take notice, why are we still using SMT when SMT4 and SMT8 exist? Imagine if either AMD or Intel released a 4C/16T or 4C/32T processor, everyone would go nuts. SMT can be added to a cpu for relatively cheap from an added circuitry standpoint and tests with 12C/96T cpu vs a 24C/96T cpu show that under some workloads the former is faster than the latter.
            SMT is great for servers where bulk processing is run without any interactivity. Going with SMT4/8 would be even better. But SMT interferes with the kernel's ability to schedule processing time fairly.

            The original purpose of SMT was to reduce stalled pipelines on the CPU, keeping it busy as close to 100% as possible. Synthetic benchmarks show SMT enabled processors scoring higher in high multithreaded workloads. But make a search online for the Intel 9700K vs 9900K in gaming benchmarks. It turns out in latency sensitive applications, like video games, SMT reduces 0.1% and 1% lows when there's enough cores to process all active threads in the game engine.

            This translates somewhat to desktop interactivity as well, but most process schedulers (CFS, MuQSS), are somewhat smart enough to not schedule unrelated tasks on the same core through SMT when it can avoid it. And interestingly, MuQSS avoids scheduling work on thread siblings for SCHED_IDLEPRIO tasks entirely. This is the only sure way to guarantee that background tasks cannot interfere with foreground tasks.

            TLDR; SMT was designed to keep processors busy doing work when stalled, boosting throughput. Gaming benchmarks show SMT gives lower FPS in 0.1% and 1% lows (jank/stutter), when there's enough cores to run the game without waiting. Going for more than 2 threads in SMT will only benefit servers and non-interactive applications.
            Last edited by damentz; 12 June 2019, 10:12 PM. Reason: Add TLDR

            Comment


            • #7
              Originally posted by sophisticles View Post
              I wish AMD and Intel would take notice, why are we still using SMT when SMT4 and SMT8 exist? Imagine if either AMD or Intel released a 4C/16T or 4C/32T processor, everyone would go nuts. SMT can be added to a cpu for relatively cheap from an added circuitry standpoint and tests with 12C/96T cpu vs a 24C/96T cpu show that under some workloads the former is faster than the latter.
              There is little need for SMT to begin with. Unless your CPU, application, compiler is constructed like a pile of crap.
              The purpose of SMT is to fill up UNUSED execution slots. Unused execution slots are side effects of a lot of different things.
              SMT was added because it is easier than doing the "other things" right.
              Adding more threads create more context switches and contention for CPU resources and OS administrative tasks.
              More synthetic execution threads does not make your CPU execute faster.
              Unless you have an extremely wide CPU and rubbish dynamic execution model you're unlikely to get any gains by tacking a new number to SMT.

              Trust me. If there were a lot of magic gains to be had from adding SMT4 and SMT8, Intel and AMD would have done it.
              I'd much rather get performance up by other means and be done with the unpredictability of SMT once and for all.

              Comment


              • #8
                Originally posted by milkylainen View Post

                There is little need for SMT to begin with. Unless your CPU, application, compiler is constructed like a pile of crap.
                The purpose of SMT is to fill up UNUSED execution slots. Unused execution slots are side effects of a lot of different things.
                SMT was added because it is easier than doing the "other things" right.
                Adding more threads create more context switches and contention for CPU resources and OS administrative tasks.
                More synthetic execution threads does not make your CPU execute faster.
                Unless you have an extremely wide CPU and rubbish dynamic execution model you're unlikely to get any gains by tacking a new number to SMT.

                Trust me. If there were a lot of magic gains to be had from adding SMT4 and SMT8, Intel and AMD would have done it.
                I'd much rather get performance up by other means and be done with the unpredictability of SMT once and for all.
                Well, i agree partially with you but i have my reservations about latency.

                If we are talking about Windows i do agree with you, specially since is public knowledge that Microsoft doesn't have one decent engineer that understand how a scheduler is supposed to work, even Windows Servers scheduler are atrocious and go full retard more often than not.

                If we are talking about Linux, i don't agree with you completely because in my experience Linux handle SMP/NUMA and locality allocations pretty damn well by default and even better picking a custom scheduler.

                about gaming is true that SMP/NUMA/SMT will have a bit more latency compared to real cores but lately i removed my Windows partition and migrated all my games to Wine/esync/DXVK i have have noticed that even when the FPS is indeed a bit lower the smoothness is impressive as well as the load times, for example:

                Windows Sata3 SSD NTFS + NTFS 2TB hdd for games and mods:

                1.) Skyrim SE: main game SSD + 250 mods(MO2) on 2 TB disk

                Startup time: Around 2-5m(depends warm vs cold)
                FPS: around 80
                Constant stutter because Windows either randomly start trashing my I/O with random process spawns or goes full retard handling my 4k textures streaming specially when moving on the wild and the scenery changes
                CPU dependent stuff like mods actions under heavy scripts sometime stalls for now good damn reason(CPU usage is low overall) and suddenly my char is trying to 10 things at once a second later
                Loading screens: in resume time to get a coffee, specially outside zones(huge amount of 4k+ textures)

                2.) Linux/Wine: added back the SSD to my existing Linux one(both 256GB) into a ZFS RAID1 and converted the Game drive to ZFS as well

                Startup: between 10s to 1m(depends warm vs cold)
                FPS: around 65-70
                Smooth AF, i couldn't believe my eyes, even when my VRAM usage is around 7GB and i change to a completely different place
                Loading screens: 20-30 seconds tops(i don't fully understand yet why the difference is so huge tho)
                All my scripts run at the time they are supposed too(outside know broken mods and stuff) and don't affect the smoothness of the game play

                Also noticed that Linux never pegged one or two specific cores but most of the time keep the load distributed as much as possible, while Windows always punishes core 0 heavily.

                A friend of mine reproduced something similar on a ThreadRipper 1920x/Vega56 system(also converted from Windows to Linux, since he was pissed off about the game feeling like crap when modded on that beast CPU/GPU)(CPU is on NUMA mode BTW not gaming) and he is using 2x1TB M.2 SSDs on a ZFS raid1(according to him SkyrimSE don't have load screens anymore LOL)

                My system is an Xeon E3 1231v3/16gb/Rx470 8gb(under Linux i undercloked the RAM and undervolted the core to be safe on Windows was overclocked with AMD software)

                I have similar experiences with Witcher 3, DOA6, DMC5 and others.

                Note: i haven't found a way to use ENB yet on DXVK, use Reshade for now before anyone asks

                Comment


                • #9
                  Originally posted by damentz View Post

                  SMT is great for servers where bulk processing is run without any interactivity. Going with SMT4/8 would be even better. But SMT interferes with the kernel's ability to schedule processing time fairly.

                  The original purpose of SMT was to reduce stalled pipelines on the CPU, keeping it busy as close to 100% as possible. Synthetic benchmarks show SMT enabled processors scoring higher in high multithreaded workloads. But make a search online for the Intel 9700K vs 9900K in gaming benchmarks. It turns out in latency sensitive applications, like video games, SMT reduces 0.1% and 1% lows when there's enough cores to process all active threads in the game engine.

                  This translates somewhat to desktop interactivity as well, but most process schedulers (CFS, MuQSS), are somewhat smart enough to not schedule unrelated tasks on the same core through SMT when it can avoid it. And interestingly, MuQSS avoids scheduling work on thread siblings for SCHED_IDLEPRIO tasks entirely. This is the only sure way to guarantee that background tasks cannot interfere with foreground tasks.

                  TLDR; SMT was designed to keep processors busy doing work when stalled, boosting throughput. Gaming benchmarks show SMT gives lower FPS in 0.1% and 1% lows (jank/stutter), when there's enough cores to run the game without waiting. Going for more than 2 threads in SMT will only benefit servers and non-interactive applications.
                  I see the potential of more aggressive SMT in workstation loads, where you want your compile job to finish as quickly as possible. I think it would be nice if AMD released a Threadripper with SMT4, it could be a performance gain for certain uses.

                  Comment


                  • #10
                    Originally posted by DoMiNeLa10 View Post

                    I see the potential of more aggressive SMT in workstation loads, where you want your compile job to finish as quickly as possible. I think it would be nice if AMD released a Threadripper with SMT4, it could be a performance gain for certain uses.

                    AMD is going to release a 64-core/128-thread TR in Q4 2019.

                    I guess it will suffice.

                    Comment

                    Working...
                    X