Announcement

Collapse
No announcement yet.

Linux 4.7 CPUFreq Schedutil Testing vs. P-State

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linux 4.7 CPUFreq Schedutil Testing vs. P-State

    Phoronix: Linux 4.7 CPUFreq Schedutil Testing vs. P-State

    With the in-development Linux 4.7 kernel there is a new CPUFreq governor that leverages the kernel's scheduler utilization data in an attempt to make better decisions about adjusting the CPU's frequency / performance state. Here are some benchmarks of that new CPUFreq governor, Schedutil, compared to the other CPUFreq governors as well as the Intel P-State CPU frequency scaling driver.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    In early days of intel_pstate Colin King and Canonical disabled it in initial release of Ubuntu 12.04LTS and only in later HWE it was enabled. With Haswell refresh CPUs its been story of CPU running at full frequency constantly even at idle leading to increased power consumption and heat. I guess its time to bring CPUFREQ back in light of these results. Time and again its shown no different in performance as it had promised over the trusted cpufreq. I think its time we move back to cpufreq. I am forced to do that in my Arch installation due to unusually high frequency at idle for my i7 4790 and higher power consumption.

    Comment


    • #3
      something goes wrong with intel pstate in 4.7

      Comment


      • #4
        Originally posted by linuxforall View Post
        In early days of intel_pstate Colin King and Canonical disabled it in initial release of Ubuntu 12.04LTS and only in later HWE it was enabled. With Haswell refresh CPUs its been story of CPU running at full frequency constantly even at idle leading to increased power consumption and heat. I guess its time to bring CPUFREQ back in light of these results. Time and again its shown no different in performance as it had promised over the trusted cpufreq. I think its time we move back to cpufreq. I am forced to do that in my Arch installation due to unusually high frequency at idle for my i7 4790 and higher power consumption.
        the intel idle driver should take care of it though. scheduler is supposed not to be used on intel because it adds latency. i guess the intel idle driver is not doing its job then

        Comment


        • #5
          comparing power consumption would be nice. just to see if it introduces much overhead. or actual/avg. frequncies...

          Comment


          • #6
            sry for the double post.
            i dont know much about schedulers, but i find it interesting, and i hope to get to know some stuff about it in my studies. So my question is: would it make sense, or is it anyhow used to remember which core (cache) worked on what data? so one can ensure best probability for reusing cache an minimise the amount of access to the main memory? And would it make sense to have a different scheduler for Intel and AMD? especially with AMDs HSA approach..

            Comment


            • #7
              Originally posted by jakubo View Post
              comparing power consumption would be nice. just to see if it introduces much overhead. or actual/avg. frequncies...
              Comparing power consumption during benchmarks is quite pointless, since normally we want computers to complete their jobs ASAP and then computer is free to idle as much as current load permits. Since benchmarks are intentionally trying to cause highest possible loads, low power consumption during benchmark isn't advantage, it is ridiculous bug in frequency scaling and somesuch, causing performance degradation. However, power consumption matters when system is idle or only partially busy (e.g. web browsing, listening to music, watching videos, ...). Yet these cases aren't covered by these measurements.

              Comment


              • #8
                Originally posted by jakubo View Post
                sry for the double post.
                i dont know much about schedulers, but i find it interesting, and i hope to get to know some stuff about it in my studies. So my question is: would it make sense, or is it anyhow used to remember which core (cache) worked on what data? so one can ensure best probability for reusing cache an minimise the amount of access to the main memory? And would it make sense to have a different scheduler for Intel and AMD? especially with AMDs HSA approach..
                Cache locality is something the kernel developers pay pretty close attention to. The CFS scheduler that came to Linux around 2.6.23 uses an rbtree with tasks sorted by their virtual run time to determine who runs next. That tree is stored per-cpu, and only occasionally does a separate load balancing operation take place to move tasks to idle cpus. The O(1) scheduler before that also had per-cpu runqueues. Other than keeping tasks on their original cpu, the scheduler doesn't concern itself with what data a task was working on (that's something the virtual memory subsystem handles). Doing so would overly complicate the scheduler and slow it down -- something you wouldn't want on a function that could run thousands of times a second.

                Regarding scheduling for different architectures, I think it's generally agreed we should NOT start making different code for different companies or different processors. Instead, the kernel is aware of what type of machine it's on, so it self tunes for things like the number of cpus, whether memory is non-uniform (NUMA), whether some cpus are faster (BIG.little), etc. Also, the scheduler is partly modular, and also has different approaches for types of tasks like realtime (e.g. google SCHED_RT).

                HSA is a whole other beast. GPUs don't run x86 instructions (or PPC, or ARM, or SPARC, etc.) so they are not scheduled like other tasks. Application code needs is compiled with HSA support using specialized libraries to generate the relevant bytecode (OpenMP, OpenCL, CUDA, etc.). Then when the kernel runs (schedules) a task that calls on those libraries, the HSA kernel driver runs alongside to execute the HSA bytecode on the GPU.

                If you're learning, here are some resources I'd recommend:
                IBM DeveloperWorks Inside the Linux 2.6 CFS Scheduler


                A Decade of Wasted Cores (has a pretty good background on SMP and NUMA load balancing)


                The Linux Kernel source documentation itself


                Linux Kernel Summit coverage at LWN (clues to how scheduling is going to become more power aware like we're seeing in 4.7)


                Comment


                • #9
                  Originally posted by SystemCrasher View Post
                  Comparing power consumption during benchmarks is quite pointless, since normally we want computers to complete their jobs ASAP and then computer is free to idle as much as current load permits. Since benchmarks are intentionally trying to cause highest possible loads, low power consumption during benchmark isn't advantage, it is ridiculous bug in frequency scaling and somesuch, causing performance degradation. However, power consumption matters when system is idle or only partially busy (e.g. web browsing, listening to music, watching videos, ...). Yet these cases aren't covered by these measurements.
                  PTS has an idle power monitoring mode. Also, performance per watt is very important. This article is quite incomplete without that...

                  Comment


                  • #10
                    performance and power usage are pretty much aligned as finishing the job fast will allow the system to go into sleep states and save power, and sharper edges on signals usually allow lower voltages. (not sure about the last point. but since the current will increase an P=RIĀ², well you know but still if you can lower voltages you can also lower current...)
                    So the important question is this: how much power has been consumes per task?

                    Comment

                    Working...
                    X