Announcement

**bug77** · 20 October 2015, 10:32 AM

A Phoronix Premium subscriber requested some fresh GCC compiler optimization tests, so here's some current results...

Well, the results are certainly NOT HERE.

**MattFulkerson** · 20 October 2015, 11:35 AM

bug77, there is a link to the results in the article. Those results, coupled with the major performance regressions down Michael has found earlier, suggest that people who need performance should stick with Ubuntu 15.04 w/ gcc 4.9 rather than upgrading to Ubuntu 15.10 w/ gcc 5.2.

**SystemCrasher** · 20 October 2015, 12:00 PM

Some observations:
1) -march=native often makes things ... worse. I can admit the very same thing happens on AMD CPUs as well.
2) Its kinda strange LTO makes performance worse in some cases. Can someone explain how it could happen at all? If I remember it only supposed to discard unused code. How could it make benchmarks result 20% worse?

**caligula** · 20 October 2015, 12:38 PM

Originally posted by SystemCrasher View Post

Some observations:
1) -march=native often makes things ... worse. I can admit the very same thing happens on AMD CPUs as well.
2) Its kinda strange LTO makes performance worse in some cases. Can someone explain how it could happen at all? If I remember it only supposed to discard unused code. How could it make benchmarks result 20% worse?

Um did you look at this? http://openbenchmarking.org/result/1...HA-GCCCOMPIL63

To me the optimizations improve the results.

**SystemCrasher** · 20 October 2015, 03:06 PM

Originally posted by caligula View Post

Um did you look at this? http://openbenchmarking.org/result/1...HA-GCCCOMPIL63
To me the optimizations improve the results.

I did. And if you failed to notice, arch=native made 8 tests to run worse than plain -O3. And lto version ... missing half of data without any good explanation. What happened? And across few results, 2 tests were absolutely worst of all. This makes me puzzled since I do not really understand how just adding lto can make results worse.

As obvious example I did benchmark of LZ4 compression algo, on AMD FX CPU. Fastest was (stock) -O3 option from lib's author, who knows how to do it right. Attempts to use arch=native made things noticeably worse. So its not unique feature of this test set. So, before claiming there was improvement it is better idea to actually measure interesting workload to see if it really a case. As you can see, it can easily turn to regression.

**caligula** · 20 October 2015, 04:23 PM

Originally posted by SystemCrasher View Post

I did. And if you failed to notice, arch=native made 8 tests to run worse than plain -O3. And lto version ... missing half of data without any good explanation. What happened? And across few results, 2 tests were absolutely worst of all. This makes me puzzled since I do not really understand how just adding lto can make results worse.

As obvious example I did benchmark of LZ4 compression algo, on AMD FX CPU. Fastest was (stock) -O3 option from lib's author, who knows how to do it right. Attempts to use arch=native made things noticeably worse. So its not unique feature of this test set. So, before claiming there was improvement it is better idea to actually measure interesting workload to see if it really a case. As you can see, it can easily turn to regression.

I had to try them out. I seem to get same results with GCC 5.2. For example the C-ray test compiles a 500 LOC file. I preprocessed this into a single 2200 line C file that doesn't #include anything, just compiles on its own and links with math lib and pthreads. Really odd..

Code:

# gcc test.c -o test -O3 -march=native -lm -lpthread -flto
# ./test -t 8 -s 800x600 -r 8 -i sphfract -o output.ppm

Three runs:
Rendering took: 8 seconds (8877 milliseconds)
Rendering took: 8 seconds (8936 milliseconds)
Rendering took: 8 seconds (8587 milliseconds)

# gcc test.c -o test -O3 -march=native -lm -lpthread
# ./test -t 8 -s 800x600 -r 8 -i sphfract -o output.ppm

Three runs:
Rendering took: 7 seconds (7589 milliseconds)
Rendering took: 7 seconds (7802 milliseconds)
Rendering took: 7 seconds (7799 milliseconds)

**jason.oliveira** · 20 October 2015, 11:09 PM

what kind of idiot uses -O3? -march=native should be safe (and provide a sizeable performance increase) along with -O2. don't unroll loops, don't omit frame pointers. that leads to unstable systems.

**caligula** · 20 October 2015, 11:20 PM

Originally posted by jason.oliveira View Post

what kind of idiot uses -O3? -march=native should be safe (and provide a sizeable performance increase) along with -O2. don't unroll loops, don't omit frame pointers. that leads to unstable systems.

Just out of curiosity, have you even tested your claims? For example C-Ray performs 34% faster with -O3 vs -O2 on my system. -march=native doesn't seem to have any effect on C-Ray performance in my tests. Not sure which target GCC 5.2 is using by default on Core i7 Ivy Bridge. I could also test with Haswell and Skylake.

**jason.oliveira** · 20 October 2015, 11:32 PM

Originally posted by caligula View Post

Just out of curiosity, have you even tested your claims? For example C-Ray performs 34% faster with -O3 vs -O2 on my system. -march=native doesn't seem to have any effect on C-Ray performance in my tests. Not sure which target GCC 5.2 is using by default on Core i7 Ivy Bridge. I could also test with Haswell and Skylake.

C-Ray will perform faster, but it won't be nearly as accurate. back in 2004, I was running some pretty stupid insane CFLAGS. things were definitely faster, but at the cost of any sembalance of system stability. I eventually disabled -O3 in favor of -O2 or Os. one should look at the options that -O3 enables, and ask yourself whether it's worth the costs. If build with -O3, and you start seeing weird glitches, make another build with -O2.

Announcement

Ubuntu 15.10 + GCC 5.2: -O3, March=Native, FLTO Tests

Ubuntu 15.10 + GCC 5.2: -O3, March=Native, FLTO Tests

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment