The Importance Of Benchmark Automation & Why I Hate Running Linux Games Manually

Written by Michael Larabel in Standards on 4 June 2016 at 05:00 PM EDT. 27 Comments

Yet again with today's GeForce GTX 1080 Linux review there were multiple people asking "why XYZ Linux game wasn't tested", a recurring topic now over the past several years.

XYZ game wasn't tested in that review or any other article since it can't be properly automated, that's usually the explanation whenever prompted in the forums. The game/engine wasn't either designed to be automated-benchmark friendly, the developers disabled it in the debug build, or in many cases when it's ported to Linux by companies like Feral Interactive they simply didn't bother with porting that functionality to Linux.

Contrary to the belief of some, the test automation requirement isn't simply out of being lazy, but far from it. As I've explained dozens of time already in the forums and elsewhere but with people still asking the question or wondering why I am not testing whatever the new Linux game of the week is, here's my explanation. Of course, all proper benchmarking on Phoronix is done via the Phoronix Test Suite and that's just not due to self-promotion, but for meeting my requirements I've come to appreciate and expect a higher standard over the years -- as have many other IHVs and ISVs testing on Linux.

Time - Yes, being able to automate a benchmark saves a lot of time. Especially when you are the only one running such a site full-time and not taking a day off in over 3 years. It also saves time for the thousands of other companies running the Phoronix Test Suite for their benchmarking needs, but this is actually not even close to being the most important reason for test/benchmark automation...

Accuracy - For nearly all of the hundreds of test profiles run via the Phoronix Test Suite, each benchmark is generally set to run three times by default (with higher variance tests, usually 5~10 times; longer, generally accurate tests 1~2). For every graphics test I can think of, it's set to at least three times. After the Phoronix Test Suite executes said benchmark three times, if the deviation in the result is too high it will automatically run the test more times (up to twice the original run count, as the default behavior) until it's a more acceptable result. The Phoronix Test Suite also supports the test profiles being able to skip the data over say the very first run, etc. All of this data between runs is always archived and is recorded automatically and all handled transparently. Much better than those running only the game test one time or running it say three times and only accumulating that average regardless of possible deviation in the results.

Heck, with the complete Phoronix Test Suite to OpenBenchmarking.org workflow, I am never touching the data results manually... I don't need to worry about any copy-paste fails or typos when entering data into a spreadsheet or the like. It's all automated and original data archived.

System Monitoring - The Phoronix Test Suite is able to systematically record the system hardware/software sensors in real-time during the benchmarking process to be able to see how the GPU utilization rate, system temperature, or dozens of other sensors are impacted by a particular test. It's all recorded per-test and overall for your whole test queue. Doing any of that manually would be very inaccurate over the course of many tests.

Performance-Per-XXX - Due to the aforementioned system monitoring functionality, it's possible to reliably add in features like the performance-per-Watt (highly featured in many of our articles) that's calculated precisely upon the system power consumption monitored during the test execution phase. Doing any of that manually is sure to guarantee higher variance.

Reproducibility - With the Phoronix Test Suite / OpenBenchmarking.org design, everything is designed to be standardized and reproducible. That's why across the world you can simply run a command (e.g. phoronix-test-suite benchmark xxx) and be able to download the same exact test version we ran, the same exact test configuration, and run in the same exact manner. It's the whole beauty of OpenBenchmarking.org and our unique approach that anyone can reproduce our tests.

Transparency - This really ties into the previous item about reproducibility, but with all of the test profiles being open-source and our benchmarking execution framework being GPLv3 licensed, you can be assured exactly what was tested and all of the details.

Extensibility - The Phoronix Test Suite provides a module framework for adding new features (actually, the system monitoring and performance-per-Watt mentioned items are even modules themselves). There are also modules for being able to ensure your screensaver is properly suspended, push notifications to your phone/tablet/mobile device about ongoing benchmark status, and also implement other unique features like easy performance-per-dollar benchmarking.

Adaptability - The very same test cases that are used for benchmarking on phoronix.com can then be carried over when bisecting a performance regression on a particular piece of software or if wanting to monitor the performance of a given piece of software on a daily or per-commit basis, such as can be easily done via the tie-in Phoromatic component and demonstrated on LinuxBenchmarking.com.

Analytics - With all of the test results being stored within standardized (XML), it allows for awesome analytic possibilities now and in the future. This is part of what OpenBenchmarking.org is about with being able to analyze broader trends over time, mine into data sets for high performance Linux systems, and much more.

Hopefully this article was enlightening if you are new to Phoronix or haven't seen my previous rants on the matter of automated benchmarking requirements. From time to time I will fire up an interesting game and run a test manually, but aside from that, these are the reasons why I won't randomly test some new Linux game.

Want change? Share this and encourage more game developers to ensure their game is benchmark-friendly. Or in large part, ensuring those studios porting their games to Linux ensure such functionality is in place. Feral Interactive is just one company that comes to mind that has ported many high profile games to Linux but do not always expose automated benchmark functionality we can interface with via the Phoronix Test Suite where as often times the original game on Windows supports such capabilities via command-line switches and the like. My whole rant comes down to being able to run a benchmark in an automated manner from a command-line from start to finish and be able to dump the result(s) data to the standard output or a log file!

If anyone has any other questions or concerns about the Phoronix Test Suite, Phoromatic, or OpenBenchmarking.org, feel free to ask away via the commenting on this article in the forums. If you agree with what's been said, please consider joining this week's Phoronix Premium deal that goes to support this site and also my continued development of the GPLv3 benchmarking software. Companies in need of commercial benchmarking support or custom engineering services can contact us.

27 Comments