Announcement

Collapse
No announcement yet.

AMD Cooking Up A "PAN" Feature That Can Help Boost Linux Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Cooking Up A "PAN" Feature That Can Help Boost Linux Performance

    Phoronix: AMD Cooking Up A "PAN" Feature That Can Help Boost Linux Performance

    AMD open-source engineers sent out a request for comments on a new kernel feature called "PAN", or Process Adaptive autoNUMA. Early numbers shown by AMD indicate that PAN can help with performance in some workloads on their latest server hardware by a measurable amount...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Enabling "L3 as numa" gives you 16 numa domains on dual socket Milan system, so yeah, something like this PAN is very much in need on regular servers. It's a bit different if your HPC scheduler is aware of those numa domains and your jobs are small enough to fit in each of them and not cross numa boundaries ... then all is fine.

    Comment


    • #3
      Running OpenMP code on dual-Zen3 computer can be terrible, sometimes even slower than single threaded code. So this feature is more than welcome !

      Comment


      • #4
        Yes, reminds me a bit of big SUN iron some 20 years ago ...

        Comment


        • #5
          Some Tasks will benefit a lot from this!
          Think about VM-Hypervisors, some of them handle the NUMA domains of an EPYC system really poorly..

          Comment


          • #6
            Originally posted by kieffer View Post
            Running OpenMP code on dual-Zen3 computer can be terrible, sometimes even slower than single threaded code. So this feature is more than welcome !
            And the same code delivers a good speedup on single-CPU/multi-core systems? Because it seems to me that OpenMP requires fairly judicious use by the programmer.

            It's not obvious to me this feature will solve your problem. That would depend a lot on the particulars of what you're doing and the bottleneck you're hitting.

            BTW, I'm a big advocate of setting the environment variable: OMP_WAIT_POLICY=passive as it defaults to active on many systems, and that just burns obscene amounts of CPU cycles doing a lot of nothing. I've seen cases where it slowed down my program, by starving out its non-OpenMP parts.

            Comment


            • #7
              Originally posted by coder View Post
              And the same code delivers a good speedup on single-CPU/multi-core systems? Because it seems to me that OpenMP requires fairly judicious use by the programmer.

              It's not obvious to me this feature will solve your problem. That would depend a lot on the particulars of what you're doing and the bottleneck you're hitting.

              BTW, I'm a big advocate of setting the environment variable: OMP_WAIT_POLICY=passive as it defaults to active on many systems, and that just burns obscene amounts of CPU cycles doing a lot of nothing. I've seen cases where it slowed down my program, by starving out its non-OpenMP parts.

              I let you judge ...
              GitHub Gist: instantly share code, notes, and snippets.

              The test code is the same, this benchmark has been made on a single Epyc-Rome. I just recently tested on a dual Epyc-Milan and the best performances were observed by using only cores sharing the same L3, i.e. having 8 processes with OpenMP on 4 to 8 threads. On the Zen3 system, 1core or 2 sockets was giving the same disappointing results.

              Comment


              • #8
                Originally posted by kieffer View Post
                Thanks for sharing. Sadly, I'm not familiar with most of the python modules you're using, nor do I know anything about how they utilize OpenMP.

                I'm afraid my only tip remains about setting the environment variable OMP_WAIT_POLICY=passive. Not sure if that'll help you, but it's generally a good idea. At the very least, it'll save some energy.

                Good luck!

                Comment

                Working...
                X