Wow, nice! Much more in-line with what I was expecting. And a Pandaboard with hardfp should get in the same ballpark as that, right? Suddenly the power question is in much sharper relief.
How much is from hardfp and how much from a tweaked arm core (exynos)? I recall samsung's pr saying the exynos outruns omap4.
Exynos4210 may have quite a bit faster memory controller, especially if compared to older OMAP4430 which had some serious problems with memory performance. But OMAP4460 was supposed to resolve the issue.I recall samsung's pr saying the exynos outruns omap4.
Also a major problem for ARM when running tests like this is the missing sane support for runtime cpu features detection (-march=native and -mtune=native options support in gcc, reliable neon detection and use in all the neon optimized libraries). ARM Ltd. has been aware of the problem for years, but did nothing to address it
As for the benchmarks, "compress-7zip" test compiles the code with -O optimization, which is equivalent to -O1:
This can be solved by setting EXTRAOPTFLAGS environment variable to something more reasonable, for example at least "-O2".Code:$ cat install.log mkdir -p bin make -C CPP/7zip/Bundles/Alone all make: Entering directory `/mnt/mmcblk0p2/.phoronix-test-suite/installed-tests/pts/compress-7zip-1.6.0/p7zip_9.20.1/CPP/7zip/Bundles/Alone' g++ -O -pipe -s -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DNDEBUG -D_REENTRANT -DENV_UNIX -D_7ZIP_LARGE_PAGES -DBREAK_HANDLER -DUNICODE -D_UNICODE -c -I. -I../../../myWindows -I../../../ -I../../../include_windows ../../../myWindows/myGetTickCount.cpp g++ -O -pipe -s -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DNDEBUG -D_REENTRANT -DENV_UNIX -D_7ZIP_LARGE_PAGES -DBREAK_HANDLER -DUNICODE -D_UNICODE -c -I. -I../../../myWindows -I../../../ -I../../../include_windows ../../../myWindows/wine_date_and_time.cpp ...
The build system for libvpx clearly does not use NEON, which explains poor results for "VP8 libvpx Encoding" test:
Trying to configure libvpx as "./configure --target=armv7-linux-gcc" spits out a funny error message: "Unable to invoke compiler: arm-none-linux-gnueabi-gcc -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64". Why would they expect the compiler to be named this way?Code:# cat install.log Configuring selected codecs enabling vp8_encoder enabling vp8_decoder Configuring for target 'generic-gnu' enabling generic Creating makefiles for generic-gnu libs Creating makefiles for generic-gnu examples Creating makefiles for generic-gnu docs [DEP] vpx_config.c.d [DEP] vp8/decoder/reconintra_mt.c.d [DEP] vp8/decoder/idct_blk.c.d [DEP] vp8/decoder/threading.c.d [DEP] vp8/decoder/onyxd_if.c.d ...
Some other tests may show suboptimal results too, but I haven't looked there yet.
Enabled the use of NEON in VP8 LIBVPX ENCODING test by hacking libvpx build scripts:
This improves Frames Per second rating from 1.01 to 1.35 for Exynos4210. Though this is still worse than 1.55 shown by Intel Atom.Code:$ ./configure --target=armv7-linux-gcc Configuring selected codecs enabling vp8_encoder enabling vp8_decoder Configuring for target 'armv7-linux-gcc' enabling armv7 enabling armv6 enabling armv5te enabling fast_unaligned Creating makefiles for armv7-linux-gcc libs Creating makefiles for armv7-linux-gcc examples Creating makefiles for armv7-linux-gcc docs
This test program gets built without any optimizations at all! If we append -O3 option to the existing -fopenmp, the result for Exynos 4210 improves from 2489 seconds to 557 seconds! This is very disturbing and shows that phoronix-test-suite needs some major fixes. And a lot of data collected at openbenchmarking.org up to this moment is just useless garbageCode:$ cat install.sh #!/bin/sh tar -zxvf smallpt-1.tar.gz g++ -fopenmp smallpt.cpp -o smallpt-renderer echo $? > ~/install-exit-status echo "#!/bin/sh ./smallpt-renderer 100 > \$LOG_FILE 2>&1 echo \$? > ~/test-exit-status" > smallpt chmod +x smallpt
While that does improve the Exynos results, it doesn't invalidate the results relative to each other. They all got the same optimization.