Announcement

Collapse
No announcement yet.

New Daily, Per-Commit Testing Of Mesa, Kernel, Compilers

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • New Daily, Per-Commit Testing Of Mesa, Kernel, Compilers

    Phoronix: New Daily, Per-Commit Testing Of Mesa, Kernel, Compilers

    New daily and per-commit testing will commence for the mainline Linux kernel, the Intel Mesa driver, and LLVM/Clang (and possibly GCC) compilers. Similar to past performance "trackers" this will lead to more timely spotting of performance regressions affecting real-world workloads...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    This is amazing. Good job Michael.
    I always wondered why the Linux kernel and some other projects don't have this capability themselves.
    I mean it's very important to check for regressions.

    Comment


    • #3
      Originally posted by Danny3 View Post
      I always wondered why the Linux kernel and some other projects don't have this capability themselves.
      Most of them have functional testsuites. PTS is a performance testsuite. Unfortunately, running it on the final git repository is a little late, and usually programmers or subsystem maintainers are expected to check for such things before commiting.

      Wrt the linux kernel, it'd be more useful if the DRI maintainer ran piglit and the filesystem maintainer ran iozone before pushing to linus, instead of michael running PTS on linus' tree.
      Also remember that no amount of PTS can replace the one guy who noticed that his scientific workload inside a VM on a 390509-core-cluster went 0.0135% slower after upgrading. Kernel performance just has so many corner cases..

      Mesa has piglit. Developers usually run it before submitting invasive patches.
      PTS contains several useful real-world tests, but they have their problems: anytime mesa supports a new set of extensions, there's some app that will switch render paths and hit a performance regression.

      GCC has a testsuite for functional correctness. New optimization passes are usually hidden behind a compiler flag at first, giving users ample time to compare their project with and without the pass. An automated framework that just blindly compiles from git wouldn't catch performance changes until after they have been properly tested and enabled by default. And again I doubt that PTS covers enough cases.


      I don't mean to sound overly negative, more data certainly cannot hurt if interpreted correctly, but currently PTS cannot help with functional regressions and lacks the variety and robustness to replace end-user performance testing.

      Comment


      • #4
        Originally posted by rohcQaH View Post
        Most of them have functional testsuites. PTS is a performance testsuite. Unfortunately, running it on the final git repository is a little late, and usually programmers or subsystem maintainers are expected to check for such things before commiting.
        From my understanding, most commits to the kernel are tested solely by "does it compile?"
        Even a "does it boot?" test is rarely tried before a patch is submitted. I think this could provide some much needed regression info, even if most of it is, as you said, expected. I know at least the kernel power regression was largely tracked and motivated by PTS benchmarking.

        Comment


        • #5
          Originally posted by tga.d View Post
          From my understanding, most commits to the kernel are tested solely by "does it compile?"
          Even a "does it boot?" test is rarely tried before a patch is submitted. I think this could provide some much needed regression info, even if most of it is, as you said, expected. I know at least the kernel power regression was largely tracked and motivated by PTS benchmarking.
          I would think there would be a more thurough testing done than 'does it compile and does it boot' ? Because when it comes to the kernel you want to compile it, install it, boot it, and try what you were just doing / trying to fix. The power regressions are a unique case where its subjective, use-specific, and non-obvious. People don't run computers with voltage meters attached to them and noticing when the voltage spikes after an update. Anyone doing "power subsystem" work has to be ACTIVELY conscious about the state of things and going out of their way to find regressions.

          A regression that kills network performance across the board is obvious (my downloads are taking longer), a regression that renders a filesystem unbootable tends to be obvious too (ignoring niche cases, Im talking "Oops I broke the whole kernel" regressions). A regression that raises power usage by a watt? Not so obvious, especially if doesnt lead to a dramatic increase in battery life. If your battery lasts 2hrs before an update, then lasts 1hr after the update...youre gonna notice that. An update that makes it last 1hr and 55minutes though instead of 2hrs? Not as noticeable.
          All opinions are my own not those of my employer if you know who they are.

          Comment


          • #6
            Originally posted by Ericg View Post
            I would think there would be a more thurough testing done than 'does it compile and does it boot' ? Because when it comes to the kernel you want to compile it, install it, boot it, and try what you were just doing / trying to fix.
            They are tested collectively during RC phase of a release, but are almost always NOT tested on a commit-by-commit basis for anything other than successful compile. To quote Greg Kroah-Hartman:
            So let's just build that one [commit]. Builds the file... Hey, it built. Ship it. Seriously! So I maintain the stable kernel releases - I do all the releases for the stable kernel - and people ask me, "Well how do you test it all?" I said, "It built, right?" ... But I'm serious; the Linux kernel does not have a test suite. It's very, very hard to have a test suite for hardware interactions. ... So the best thing you can ever do for us is to build the kernel, and tell us if you have a problem. That is our QA cycle. We rely on the community to do that, and that's the only way we could physically do this.

            Originally posted by Ericg View Post
            The power regressions are a unique case where its subjective, use-specific, and non-obvious. People don't run computers with voltage meters attached to them and noticing when the voltage spikes after an update. Anyone doing "power subsystem" work has to be ACTIVELY conscious about the state of things and going out of their way to find regressions.

            A regression that kills network performance across the board is obvious (my downloads are taking longer), a regression that renders a filesystem unbootable tends to be obvious too (ignoring niche cases, Im talking "Oops I broke the whole kernel" regressions). A regression that raises power usage by a watt? Not so obvious, especially if doesnt lead to a dramatic increase in battery life. If your battery lasts 2hrs before an update, then lasts 1hr after the update...youre gonna notice that. An update that makes it last 1hr and 55minutes though instead of 2hrs? Not as noticeable.
            I'd be willing to bet most people won't notice in daily use regressions unless they're well above, say, 5%. The problem is that these regressions can compound quickly, and while each individual commit that gives a 5% performance hit goes by unnoticed, when you have a few dozen of those types of commits, you now have a considerably slower kernel without anyone realizing it - particularly if these commits are done over the course of several releases. It's a sort of like the "frog in boiling water" myth. If there's some way of easily tracking where performance hits are coming in, it allows for a great way of knowing where performance can be improved.

            Comment


            • #7
              Personally I think testing is underrated.
              It's a joy to maintain something if it were written with testing
              in mind from the start, after-glued test doesn't just cut it.
              And code with no tests at all is just sad, you never know if
              you brake a rarely used feature before it's too late.

              If I could choose I would only work on projects that have the
              resources to do true 100% TDD.

              Comment

              Working...
              X