
Originally Posted by
tytso
So my impression is that you have a framework which will iterate over a number of file systems, reformat the partition to file system $foo, mount the partition as $foo, and then run the suite of benchmarks. Do you not? Or are you doing this manually, by hand? If you do have such a framework, then the file system aging function needs to inserted after the mkfs step. I'm not sure what is formally considered part of the PTS, and what is considered part of test framework. Presumably whatever uploads the results to the open benchmarking web site is also part of the test framework, which I thought was part of PTS ---- so I had assumed the mkfs was part of the PTS subsystem.
Those steps are not part of the framework itself. My involvement is primarily Phoronix Test Suite & OpenBenchmarking, not Phoronix.com. I believe that Michael does the system prep manually. We do have the concept of "context" for the test which is about either preparing the system or configuration under test. But that isn't fully fleshed out.
It's impractical for PTS to include detailed system preparation steps within the suite itself. The preparation is intensely focused on what the configuration or the variant part of the test run is. Just for FS, it could be mount options only, new vs old (aged), alternate FS, different kernels and their impact on a fs. Obviously this is meaningless for say compilar comparison. So it comes down to a routine similar to...
1. Prepare System Under Test
2. Prepare Configuration Under Test (this is really optional if the variant part is really the System Under Test)
3. Invoke "phoronix-test-suite benchmark <test>"
4. Upload to OpenBenchmarking for further discussion
5. Go to 1 for as many variants as you want.
6. Upload full comparison to OpenBenchmarking
7. If Michael is running the test, then generate an article.
We have talked about a way to take a collection of contexts and calling out to a locally configured script to put the system in that context for running the tests. That will effectively automate 2-6, but it won't be considered part of PTS, but rather it will lower the manual effort for people doing broader comparisons (or in the use with Software Development). My mental picture for the context file that might be useful for benchmarking is something like
Code:
<context-name> <context-information>
You would then have a script that can take you to the particular context. So for filesystems you might have a file such as
Code:
ext3-nobarrier 100GB-70%Cap-3%Frag-opts=nobarrier
ext3-barrier 100GB-70%Cap-3%Frag-opts=barrier
ext3-discard 100GB-70%Cap-3%Frag-opts=discard
The person executing the comparison would need to write a script that is invoked as "set-context.sh 100GB-70%Cap-3%Frag-opts=nobarrier" which would then do the system preparation (100GB, 70% capacity, 3% fragmentation, mount opts=xxx).
I assume that you can see that the same structure could easily be extended to do bisection across an ordered set of kernels or git commits.
I can't speak for others, I haven't really taken to much advantage of the Phoronix test suite because the signal to noise ratio has been too low. The focus on competition between file systems, as opposed to watching for regressions, isn't really useful for developers.
Phoronix Test Suite is effectively an independent project that grew out of the personal discussions that Michael and I would have regarding the results being presented on Phoronix Test Suite.
Phoronix Test Suite itself, is merely a test execution environment. The results that it generates, and the feeding of the information into articles in Phoronix.com is independent. I'm sure there is actually a lot of value that you could get out of the suite itself - from making available simplified repeatable test cases to monitoring updates to your code as you make them.
...
Now, I don't blame you for that --- in the end, your primary responsibility is to continued success of the commercial enterprise of this web site, which means if sensationalism drives web hits, then sensationalism it shall be.
Again for the record, I am not involved in direct way with Phoronix.com. My involvement is tangential into Phoronix Test Suite and OpenBenchmarking. My day job is as an driving teams of engineers, it's just I have a bent for seeing good engineering done, and Phoronix Test Suite is a way that I can help the industry.
But the fact remains that developers are also extremely busy folks, and if they have to spend a huge amount of time figuring out what the results might mean, they're likely not going to bother. Developers also tend to prefer benchmarks which test specific parts of the file system, one at a time. This is why we tend to use benchmarks such as FFSB, with different profiles such as "large file create", "random writes", "random reads", "large sequential reads", etc. Another favorite benchmark is fs_mark, which tests efficiency of fsync() and journaling subsystems. I don't mind looking at the application centric benchmarks, but I'm not likely to try to set them up. But if you give me lock_stat and oprofile runs, I'll very happily look at them, and discuss what the results might mean, and then work to improve those workloads as part of my future development efforts.
This part is really hard, each developer has their own sub-component or subsystem that they care about, and for each of those there are a set of metrics that directly affect those systems. But there are a hell-of-a-lot of subsystems that represent vastly different areas. So the middle ground is finding benchmarks and tests that serve as a canary in a coal mine to trigger the deeper digging. The deeper digging into a particular sub-domain marginalizes the other domains.
That said, neither Michael or myself would shirk away from deep-diving when the canary indicates that something is wrong. It is a two way street, where the integration of the tools and methodology for a domain of expertise needs to have leadership from the outside, the integration is where the biggest win is.
The bottom line is that benchmarking for the sake of improving the file system requires close cooperation with the developers. I'm not sure whether that's compatible with Phoronix's mission. If so, I'd be happy to work with you more closely. And if it's not Phoronix's cup of tea, that's OK. There is room for multiple different approaches to benchmarking. All I ask that they not be too misleading, but that's more for the sake of not leading naive users down the primrose path.....
So long as the developers are engaged in looking at the problem, rather than blaming the tool, we've got no concerns working with any developer (be it under the OpenBenchmarking or the Phoronix banner).
I don't do articles on Phoronix.com, but do blog postings on OpenBenchmarking.org, so there are ways of getting messages out through that too.
From this thread, so areas that PTS can immediately add value are in
1. Distributed end-user testing - you can get people to run a single command to get consistent results from a broad set of users)
2. Regression Management - We have trackers at http://phoromatic.com/kernel-tracker.php, setting up one is _very_ easy. Currently that one watches the ubuntu-upstream-kernel builds, but could easily do a git pull;make sort of cycle. This is very interesting since you can have distributed systems that are used for testing
3. Reproducing scenarios - If an end user sees an issue with a particular behaviour, capturing a test-case allows it to be more easily reproduced internally to developers.
Some concrete areas that we'd like to see is suggestions for improvements in the test cases or benchmarks themselves. If there are suites of tests that characterize a filesystem's behaviour integrating it isn't much of a problem.