Announcement

Collapse
No announcement yet.

Is The Linux Kernel Scheduler Worse Than People Realize?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is The Linux Kernel Scheduler Worse Than People Realize?

    Phoronix: Is The Linux Kernel Scheduler Worse Than People Realize?

    A number of Phoronix readers have been pointing out material to indicate that the Linux kernel scheduler isn't as good as most people would assume...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Well, not only that it's also SMT. That is a much bigger problem for multicore scalability than is even recognized. If you have a load with 2 threads running, it will run faster if each thread is run on a different physical core. It's simple logic, x86 pipelines average between 2-3 instructions per cycle, if both threads are on the same core, each thread can only get 2 instructions per cycle. SMT basically boils down to wasting 1-2 IPC per core.

    Comment


    • #3
      Originally posted by duby229 View Post
      Well, not only that it's also SMT. That is a much bigger problem for multicore scalability than is even recognized. If you have a load with 2 threads running, it will run faster if each thread is run on a different physical core. It's simple logic, x86 pipelines average between 2-3 instructions per cycle, if both threads are on the same core, each thread can only get 2 instructions per cycle. SMT basically boils down to wasting 1-2 IPC per core.

      It's not that simple, unfortunately. If the CPU demand is below the core capacity, there will be a dramatic performance improvement by keeping the threads on the same core due to caching. Of course, thread / process demands are continuously variable, so predictions have to be made about the best way to run them. Workload type (interactive, non-interactive, I/O bound, CPU bound) changes the prediction and also has to be determined and taken into account. That's why different schedulers work better on different workloads; their predictions are right more of the time. Creating a scheduler that is optimal for all workloads is very non-trivial.

      Comment


      • #4
        It's why (and optimisation allowed by mono cpu into no SMP kernel) it's better to have 4x single core than 1x quad core
        No contention problem, no balancing problem. At least it's not problem into server world and vm.
        Developer of Ultracopier/CatchChallenger and CEO of Confiared

        Comment


        • #5
          Glad to see attention brought to this matter, it really is a problem... And it might not be that schedulers on other systems are better, it's just that ours isn't good enough.

          You know there's something wrong when I get better results with noop than cfq, yet cfq is always the recommended default everywhere because it is theoretically better for some things. I wouldn't exactly call noop satisfying either. People generally praise BFQ but again that is geared towards unicores...

          I mean I don't know the technical details of any of this, I just know it's a mess...

          Comment


          • #6
            Would be nice if the kernel maintainers saw the article, so that the problem would be known. Would it be worth to write to an IRC, or some mailing list? Or even report a bug? What anyone think about it? Should say, I didn't yet read the paper.

            Comment


            • #7
              well I noticed quite a while ago on my old AMD 6000+ and now on my Phenom II X4 925 that setting the thread count in luxrender and blender-cycles higher than the actual core count results in a speed improvement of 1 or 2% . However I usually don't do it because the operating system seems to be more rsponsive otherwise

              Comment


              • #8
                Originally posted by rudl View Post
                well I noticed quite a while ago on my old AMD 6000+ and now on my Phenom II X4 925 that setting the thread count in luxrender and blender-cycles higher than the actual core count results in a speed improvement of 1 or 2% . However I usually don't do it because the operating system seems to be more rsponsive otherwise
                Yeah, that is due to context switching. Basically the pipeline flushes and refills. That's where that few percent is coming from. On SMT systems it's worse because both threads on a core flush.

                Comment


                • #9
                  Originally posted by macemoneta View Post


                  It's not that simple, unfortunately. If the CPU demand is below the core capacity, there will be a dramatic performance improvement by keeping the threads on the same core due to caching. Of course, thread / process demands are continuously variable, so predictions have to be made about the best way to run them. Workload type (interactive, non-interactive, I/O bound, CPU bound) changes the prediction and also has to be determined and taken into account. That's why different schedulers work better on different workloads; their predictions are right more of the time. Creating a scheduler that is optimal for all workloads is very non-trivial.
                  And yet, NoOp outperforms other schedulers on almost every single load. There are plenty of indications that you are wrong.

                  Comment


                  • #10
                    Originally posted by rabcor View Post
                    Glad to see attention brought to this matter, it really is a problem... And it might not be that schedulers on other systems are better, it's just that ours isn't good enough.

                    You know there's something wrong when I get better results with noop than cfq, yet cfq is always the recommended default everywhere because it is theoretically better for some things. I wouldn't exactly call noop satisfying either. People generally praise BFQ but again that is geared towards unicores...

                    I mean I don't know the technical details of any of this, I just know it's a mess...
                    I think you're confusing schedulers with elevators...

                    Comment

                    Working...
                    X