
Originally Posted by
LinuxID10T
See, the thing is that Clang will pretty much always be years behind in performance.
That's impossible to say, however yes I will say that it is currently quite a bit behind GCC. However those with agendas can always skew the statistics to their liking, one of which is the choice of optimization flags. For instance Micheal recently published a test between GCC 4.6/4.7 and LLVM-Clang 3.0/3.1 where he did a 7-zip benchmark with no optimization flags. That means that the compilers (atleast GCC) makes no attempts to optimize the code, hence from a benchmark standpoint it's worthless. However choosing this likely gave Micheal the statistic he wanted, which was a win for Clang/LLVM. Another thing is synthetic tests vs real-world tests, synthetic tests can be very misgiving, case in point:
I made my own benchmark script for 7-zip which I ran on a core i7, comparing GCC 4.7 against Clang 3.0, I tested both the synthetic benchmark which is built into 7-zip and also did a real-world benchmark of actually compressing a file (in this case an arch linux ISO, I'm an arch guy, what can I say), here are the results at different optimization levels:
Code:
GCC -O0 -march=corei7 1m2.420s (real-world benchmark, less is better)
GCC -O0 -march=corei7 5354 (synthetic benchmark, more is better)
Clang -O0 -march=corei7 1m1.143s (real-world benchmark, less is better)
Clang -O0 -march=corei7 5446 (synthetic benchmark, more is better)
GCC -O1 -march=corei7 0m32.008s (real-world benchmark, less is better)
GCC -O1 -march=corei7 11375 (synthetic benchmark, more is better)
Clang -O1 -march=corei7 0m35.316s (real-world benchmark, less is better)
Clang -O1 -march=corei7 11650 (synthetic benchmark, more is better)
GCC -O2 -march=corei7 0m31.076s (real-world benchmark, less is better)
GCC -O2 -march=corei7 11704 (synthetic benchmark, more is better)
Clang -O2 -march=corei7 0m34.550s (real-world benchmark, less is better)
Clang -O2 -march=corei7 11808 (synthetic benchmark, more is better)
GCC -O3 -march=corei7 0m29.432s (real-world benchmark, less is better)
GCC -O3 -march=corei7 11890 (synthetic benchmark, more is better)
Clang -O3 -march=corei7 0m34.335s (real-world benchmark, less is better)
Clang -O3 -march=corei7 11796 (synthetic benchmark, more is better)
As we can see, on this synthetic benchmarks, Clang/LLVM wins up until -O3 where GCC wins, so by choosing -O0 up to -O2 and only using synthetic benchmarking Clang/LLVM would come out the winner. However given that -O3 is the option where the compiler is supposed to favour speed over anything else (like code size) then I can't see why someone, if they only test ONE option would test anything else, unless of course you want the results to reflect some agenda.
And looking past the synthetic benchmark and at the real-world benchmark which is what actually reflects the REAL usage of 7-zip, GCC wins across the board, with a very noticeable performance increase.
Here's the script I made (sorry for my poor bash script-fu), it will automatically download p7zip and an arch linux iso (just the first time, not every run!) which it will compress in the real-world test. There's also support for ICC (intel compiler) testing there but I didn't use it for this test. The results of the benchmarking is saved in p7zip-results.txt in the same directory from which you run the script:
Code:
#!/bin/bash
name=p7zip_
version=9.20.1
dir=$name$version
results=../p7zip-results.txt
makefile=makefile.linux_amd64
arg="./bin/7za a -mx9 -m0=LZMA2:d128m -mmt dummy.7z archlinux-2011.08.19-netinstall-i686.iso"
arg2="./bin/7za b"
# change flags according to the optimization settings you want to benchmark
gccflags="-O3 -march=corei7"
clangflags="-O3 -march=corei7"
iccflags="-O3 -march=corei7"
# uncomment compilers to be tested (obviously they need to have been installed)
gcc=1
clang=1
#icc=1
if [ ! -d $dir ]; then
echo "Downloading and extracting p7zip..."
wget -q http://downloads.sourceforge.net/project/p7zip/p7zip/${version}/p7zip_${version}_src_all.tar.bz2
bzip2 -q -d "$dir"_src_all.tar.bz2
tar xf "$dir"_src_all.tar
fi
cd $dir
if [ ! -e archlinux-2011.08.19-netinstall-i686.iso ]; then
echo "Downloading iso..."
wget -q http://ftp.ds.hj.se/pub/os/linux/archlinux/iso/2011.08.19/archlinux-2011.08.19-netinstall-i686.iso
fi
# access the file for cache purposes else the first real-world test will likely suffer in disk read performance.
tar cvf dummy.tar archlinux-2011.08.19-netinstall-i686.iso
rm dummy.tar
if [ ! -z $gcc ]; then
# GCC ----------------------------------------------------------------------------------------------------
cp $makefile makefile.machine
sed -i "s/OPTFLAGS=-O/OPTFLAGS=$gccflags/" makefile.machine
make clean
make -j5
clear
echo "Benchmarking..."
bench=`{ time ${arg}; } 2>&1 | grep real | awk '{print $2}'`
echo -e "GCC\t\t$gccflags\t\t$bench\t(real-world benchmark, less is better)" >> $results
bench=`{ ${arg2}; } 2>&1 | grep Tot | awk '{printf $4}'`
echo -e "GCC\t\t$gccflags\t\t$bench\t\t(synthetic benchmark, more is better)" >> $results
rm dummy.7z
fi
if [ ! -z $clang ]; then
# CLANG/LLVM ---------------------------------------------------------------------------------------------
cp $makefile makefile.machine
sed -i "s/OPTFLAGS=-O/OPTFLAGS=$clangflags/" makefile.machine
sed -i 's/CXX=g++/CXX=clang++/' makefile.machine
sed -i 's/CC=gcc/CC=clang/' makefile.machine
make clean
make -j5
clear
echo "Benchmarking..."
bench=`{ time ${arg}; } 2>&1 | grep real | awk '{print $2}'`
echo -e "Clang\t\t$clangflags\t\t$bench\t(real-world benchmark, less is better)" >> $results
bench=`{ ${arg2}; } 2>&1 | grep Tot | awk '{printf $4}'`
echo -e "Clang\t\t$clangflags\t\t$bench\t\t(synthetic benchmark, more is better)" >> $results
rm dummy.7z
fi
if [ ! -z $icc ]; then
# ICC ---------------------------------------------------------------------------------------------
cp $makefile makefile.machine
sed -i "s/OPTFLAGS=-O/OPTFLAGS=$iccflags/" makefile.machine
sed -i 's/CXX=g++/CXX=icpc/' makefile.machine
sed -i 's/CC=gcc/CC=icc/' makefile.machine
make clean
make -j5
clear
echo "Benchmarking..."
bench=`{ time ${arg}; } 2>&1 | grep real | awk '{print $2}'`
echo -e "ICC \t\t$iccflags\t\t$bench\t(real-world benchmark, less is better)" >> $results
bench=`{ ${arg2}; } 2>&1 | grep Tot | awk '{printf $4}'`
echo -e "ICC \t\t$iccflags\t\t$bench\t\t(synthetic benchmark, more is better)" >> $results
rm dummy.7z
fi
echo "Done! Open '$results' to see the benchmark results."