Page 3 of 5 FirstFirst 12345 LastLast
Results 21 to 30 of 47

Thread: Linux 2.6.24 Through Linux 2.6.33 Benchmarks

  1. #21
    Join Date
    Aug 2009
    Location
    south east
    Posts
    338

    Default if only Linus used Phorx Test

    I read posts from kernel devs. that state they want test results to see improvements, failures, etc. Well here is real data.

    Be nice when you dont' have to recompile in order to change most the kernel parameters. #cpus, hi-mem, timer freq., dynamic ticks, cpuarch.

    Check out amd64. I regress

  2. #22
    Join Date
    Aug 2007
    Location
    Europe
    Posts
    401

    Default

    Quote Originally Posted by mtippett View Post
    The numbers are there, the tests are there and the kernels are there. If anyone is willing to dig deep to understand the difference, I would be very interested to know how far they get.
    Do what you are doing, publish the numbers.

    One thing though. If it is only one app which sees a regression it might be that that particular app is doing something wrong. If you have a number of apps which regress on the same kernel, then it may well be a kernel regression.

    As benchmark time is limited, I would use as many PTS benchmarks as possible, but don't run each for a long time. Instead of 5 minute runs for five applications, one could use 30 second runs for 30 applications; both would be 900 seconds of benchmark.

    Is this feasible?

  3. #23

    Default

    Quote Originally Posted by mtippett View Post
    Skip to the next response to that thread..

    I turned on apache, and played with ab a bit, and yup, ab is a hog, so
    any fairness hurts it a badly. Ergo, running ab on the same box as
    apache suffers with CFS when NEW_FAIR_SLEEPERS are turned on. Issuing
    ab bandwidth to match it's 1:N pig nature brings throughput right back.


    http://lkml.indiana.edu/hypermail/li...9.1/02861.html

    Remember that you can't test anything, and testing in the obvious path will usually result in flat lines - since they represent the 95% path.

    As indicated above, what has been identified is that in some scenarios CFS completely tanks. The ab is just a tool to make this visible.
    In this scenario with fair sleepers enabled, yes. However, this scenario is one out of the reality until someone runs apache on a single machine which is not recommended.* I think it's something natural scheduler is tuned to perform well in real situations. So, what's the point of this benchmark?

    As per usual, if there is any benchmark which you believes provides a suitable equivalent scenario but is more "correct", please tell us.
    Maybe replace "correct" by more meaningful. The problem is I'm not sure what could be equivalent scenario to this benchmark and if there's no such scenario this benchmark means: we've got different results in Apache benchmark running on the same machine which isn't recommended. Btw. about what scenario you were talking about? Such *?

  4. #24

    Default

    Quote Originally Posted by mtippett View Post
    A Regression is a unexpected change in behavior. If the kernel developers make a change in one area, and they are not expecting the behavior change in other areas those areas have regressed.
    If developer decided to change default file system mode to some other it's not a regression, because it is expected change in the file system behavior (it is also known it will affect some benchmarks). Michael isn't dev is he?

    I'd like you to expand on your "not done properly" if you could.
    Recommended way is to run ab on a different machine, so that's why I consider it wasn't done properly or this benchmark is strange in my opinion if you like.

  5. #25

    Default

    Quote Originally Posted by sabriah View Post
    This is standard practice.

    All scientific journals require this - tell the readers in words what the graphs and tables say anyway.

    The benefit is also that the results become searchable through search engines.


    .
    No they don't.

    Scientific journals require authors to describe in word what figures and tables show AND to draw from those numbers a valuable conclusion (something that isn't done here, obviously). If you don't you get your paper rejected.

  6. #26
    Join Date
    Aug 2007
    Location
    Europe
    Posts
    401

    Default

    Quote Originally Posted by Xheyther View Post
    No they don't.

    Scientific journals require authors to describe in word what figures and tables show AND to draw from those numbers a valuable conclusion (something that isn't done here, obviously). If you don't you get your paper rejected.
    I agree with what you say about the AND.

    BUT, and the but is big, here we talk about Phoronix role as a whistleblower. They didn't write the code, and, debugging someone else's code is a nightmare, even for Freddy on Elm Street.

    I never expect them to identify the pivotal change in the code. Heck, even deciding which of several possible layers (eg App or Kernel) can be worse than difficult.

    However, I do think the ones to draw the valuable conclusions you mention from the numbers presented at Phoronix should be the developers. Who else can interpret them with a comparatively minimal effort, and solve them?

    Showing the world system based regressions is one of several important ways to catch bugs and I applaud Phoronix for doing this task.

    I also realize that their use of default settings is a pragmatic choice, not suitable to all practices. But, tweaked settings rapidly enter the inescapable permutation hell; in how many ways can you fine tune web servers and databases?! Which is the least silly setting? Well, the default, because that is the one everyone has access to.


    .

  7. #27
    Join Date
    Jun 2006
    Posts
    311

    Default

    Quote Originally Posted by kraftman View Post
    If developer decided to change default file system mode to some other it's not a regression, because it is expected change in the file system behavior (it is also known it will affect some benchmarks). Michael isn't dev is he?
    It's a game of whack-a-mole. You make a change with an expected mole to be whacked. Once the change is made, three unexpected moles pop up.

    Industry metrics formal a formal (testing, QA, etc) environment with

    Recommended way is to run ab on a different machine, so that's why I consider it wasn't done properly or this benchmark is strange in my opinion if you like.
    For determining the expected performance of apache, yes I agree that you should have ab and the server on a different machine. But remember that we are not testing the apache installation. The component under test is the kernel in this instance, or at the very least different hardware.

    What we are showing is that there is a synthetic load that is strongly affected by the kernel changes. If we called it "pig-test" the results would still be the same.

  8. #28

    Default

    Quote Originally Posted by mtippett View Post
    But remember that we are not testing the apache installation. The component under test is the kernel in this instance, or at the very least different hardware.
    Right. This makes some things clear

  9. #29
    Join Date
    Jun 2006
    Posts
    311

    Default

    Quote Originally Posted by mtippett View Post
    I'll let Michael make comments on the reporting.

    My view is that the impact of different subsystems is heavily dependent on the interactions between different parts of the system. In a lot of cases, the changelogs may indicate, but it would usually take domain expertise in that subsystem to be able to correlate the two.
    Here's a good example

    http://airlied.livejournal.com/69074.html

    Dave, a veritable graphics guru had to ponder and run further benchmarks. And even then he still has concerns about what and where the real tradeoffs will be. Understanding the reason for a regression is absolutely a specialty. Making sure the tests allow for easy analysis is probably the primary area that we can add value.

    All in all, what a good regression benchmark needs to have is sensitivity to different areas of the system under test. A targeted benchmark for making a purchase decision is a whole different ball game. I am sure Michael is open to targeting some runs to particular areas, if they are of general interest.

    Unfortunately, a choosing your kernel and filesystem for peak web server performance isn't really what the general populace is interested in.

  10. #30
    Join Date
    Nov 2009
    Location
    Italy
    Posts
    872

    Default

    Quote Originally Posted by mtippett View Post
    But remember that we are not testing the apache installation.
    [...]
    What we are showing is that there is a synthetic load that is strongly affected by the kernel changes. If we called it "pig-test" the results would still be the same.
    I do not agree. What is interesting to see is how the kernel behaves under *REAL* loads, everything else is useless because developing a kernel is a continuous trade-off between different load scenarios. Who cares if the kernel performs badly in an unreal load scenario?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •