Results 1 to 10 of 10

Thread: Clang Compiling Against GCC On Ubuntu ARM Linux

  1. #1
    Join Date
    Jan 2007
    Posts
    14,770

    Default Clang Compiling Against GCC On Ubuntu ARM Linux

    Phoronix: Clang Compiling Against GCC On Ubuntu ARM Linux

    Here's an update on the LLVM/Clang vs. GCC compiler benchmarking on ARM hardware under Linux...

    http://www.phoronix.com/vr.php?view=MTExNzg

  2. #2
    Join Date
    Jan 2007
    Posts
    459

    Default

    ROTFL
    http://lists.cs.uiuc.edu/pipermail/l...ay/049690.html
    "Evan Cheng apple.com Request for Help: Teach ARM target to auto-detect cpu / subtarget featuresThu May 10 22:11:23 CDT 2012

    I believe one of the reason the benchmark numbers are totally bogus is that the compilation are done on ARM hosts.

    Given the benchmarks are apparently compiled without -mcpu=cortex-a9
    , I suspect LLVM ended up generating code for "generic" ARMv4 cpu.

    This article makes me sick in my stomach.
    Thanks,Evan"

    "Michael Larabel on June 11, 2012
    The bench marking was still being done from a
    PandaBoard ES with Texas Instruments OMAP4460 dual-core ARM Cortex-A9 development board. Via the CFLAGS/CXXFLAGS, -march=armv7-a was passed to each compiler. "

    On the other hand once you sort out your flags war and reach consensus it might be interesting to see this test run on a
    Calxeda quad-core ARM Cortex-A9 processor optimized for using in Servers over 10Gigabit/s internal fabric on each card
    sample box with 2 or more cards installed for 32 Cortex A9 cores/8 SOC and greater etc and you really should go and get the latest Linaro GCC etc too.

    http://armdevices.net/2012/06/11/cal...owered-server/

    http://<a href="http://www.youtube.c...5UCYJjpZoQ</a>

    Hmm i cant seem to get in post video link working, odd.
    Last edited by popper; 06-11-2012 at 11:12 AM.

  3. #3

    Default

    Quote Originally Posted by popper View Post
    On the other hand once you sort out your flags war
    The flags used in this article were just normal, a compiler flag/tuning on ARM is forthcoming in a future multi-page article.

  4. #4
    Join Date
    Jul 2008
    Location
    Berlin, Germany
    Posts
    822

    Default

    armv7 is what e.g. Ubuntu will target in their upcoming ARM releases, so it seems very relevant how that performs. Compiling all software with hardware specific CFLAGS is typically only done by Gentoo or other source based distros.

  5. #5
    Join Date
    Dec 2008
    Location
    San Bernardino, CA
    Posts
    232

    Default

    Quote Originally Posted by chithanh View Post
    armv7 is what e.g. Ubuntu will target in their upcoming ARM releases, so it seems very relevant how that performs. Compiling all software with hardware specific CFLAGS is typically only done by Gentoo or other source based distros.
    Here are a few relavant flags I'd like to see tested:

    1. Ubuntu Standard armv7 + hard-float
    2. Android Standard armv7 + softfp (note: this is not soft-float, it's still using hardware fp, just the headers are compatible with soft-float).
    3. Android Standard armv5 + soft-float

    These three flags will cover most software written for Linux and Android.

  6. #6
    Join Date
    Jul 2008
    Location
    Berlin, Germany
    Posts
    822

    Default

    Possibly Ubuntu and other distros will use -march=armv7-a -mtune=cortex-a9 (same idea as -march=i486 -mtune=i686 for x86) so that would be another interesting data point.

  7. #7
    Join Date
    Oct 2009
    Posts
    845

    Default

    Again, what is the point of running the 7-zip benchmark with no -On optimization setting? This means that atleast GCC will default to -O0 which is no optimization. Just add -O2 or preferably -O3 so that this benchmark ends up being in any way relevant, NO ONE will use 7-zip compiled with no optimizations. You are benchmarking compiler optimization here, what possible point is it then to NOT enable optimizations????

  8. #8
    Join Date
    Jan 2012
    Posts
    113

    Default

    Yeah, while the phoronix test suite framework itself is fine, the choice of benchmarks is very questionable at best.

    Let's have a look at the "popular" C-Ray 1.1 benchmark. It can be downloaded from http://www.phoronix-test-suite.com/b...ray-1.1.tar.gz
    It is typically run as "./c-ray-mt -t 32 -s 1600x1200 -r 8 -i sphfract -o output.ppm", but changing 1600x1200 to 160x120 lets it run for seconds instead of hundreds of seconds on ARM. Profiling of gcc-4.7.0 compiled code shows the following:
    Code:
    ./c-ray-mt -t 32 -s 160x120 -r 8 -i sphfract -o output.ppm
     
    samples  %        image name               symbol name
    28459    51.8672  c-ray-mt                 shade
    17869    32.5667  c-ray-mt                 ray_sphere
    4110      7.4906  c-ray-mt                 trace
    3185      5.8047  c-ray-mt                 render_scanline
    319       0.5814  libm-2.13.so             __ieee754_pow
    194       0.3536  libm-2.13.so             powl
    136       0.2479  libm-2.13.so             __exp1
    108       0.1968  libc-2.13.so             memcpy
    78        0.1422  c-ray-mt                 get_primary_ray
    73        0.1330  c-ray-mt                 get_sample_pos
    59        0.1075  libm-2.13.so             isnanl
    42        0.0765  vmlinux                  __do_softirq
    36        0.0656  vmlinux                  __schedule
    35        0.0638  libm-2.13.so             checkint
    31        0.0565  libc-2.13.so             fputc
    18        0.0328  libm-2.13.so             __mul
    4         0.0073  c-ray-mt                 main
    And this reveals a major performance problem: function calls overhead is insane. Just making sure that ray_sphere function gets inlined improves performance significantly. As a workaround, -finline-limit=100000 option can be added for more aggressive inlining. The results of "./c-ray-mt -t 32 -s 160x120 -r 8 -i sphfract -o output.ppm" on ARM Cortex-A9 1.2GHz compiled with gcc 4.7.0:
    Rendering took: 6 seconds (6685 milliseconds) for CFLAGS="-O3 -ffast-math"
    Rendering took: 5 seconds (5436 milliseconds) for CFLAGS="-O3 -ffast-math -finline-limit=100000"

    But the real fix is to use "static inline" for the performance critical functions. The one who developed this C-Ray application apparently has no clue about performance optimizations. Or maybe it was done on purpose to make the job harder for the compilers. The compilers, which are configured to use aggressive inlining by default are going to win by a huge margin on this test (trading it for larger binary sizes because there are no free cookies).

    Generally, I get an impression that such selection of phoronix benchmarks has been done on purpose. Surely, when having compiler optimizations disabled or benchmarking poorly written code such as C-Ray, the difference between the results from different compilers may be quite significant (and mostly random). Benchmarking properly written code with properly selected optimization options is surely boring, because it is less likely to show surprising wins or sensations
    Last edited by ssvb; 06-12-2012 at 04:42 AM.

  9. #9
    Join Date
    Jun 2012
    Posts
    1

    Default

    Depending on how GCC was configured (you can see by passing -v), this might be a non-issue, but passing only -march=armv7-a without other -mtune= or -mcpu= options might have resulted in GCC tuning for the Cortex-A8.
    You might want to re-check to be sure...

  10. #10
    Join Date
    Jan 2012
    Posts
    113

    Default

    Tuning for Cortex-A8 works good for Cortex-A9 too. They are reasonably similar, and scheduling instructions for in-order dual-issue processor does not usually do any harm for its out-of-order dual-issue twin. Moreover, there are cases when -mcpu=cortex-a9 is bad for performance: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 (just filed this enhancement request)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •