Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: PathScale EKOPath 5.0 Beta Compiler Performance

  1. #1
    Join Date
    Jan 2007
    Posts
    13,431

    Default PathScale EKOPath 5.0 Beta Compiler Performance

    Phoronix: PathScale EKOPath 5.0 Beta Compiler Performance

    Going on two years ago PathScale open-sourced their EKOPath 4 Compiler Suite. This Fortran/C/C++ compiler suite hasn't seen widespread adoption since then outside of some scientific circles and other select high-performance areas, but PathScale hasn't stalled in advancing their compiler software that is also still available commercially. PathScale has been preparing to release EKOPath 5.0, which is the subject of today's benchmarks.

    http://www.phoronix.com/vr.php?view=18461

  2. #2

    Default

    Thanks for posting this!
    ----------------------------
    It seems clear we have some action items to work on, but I'm not sure if we honestly have the time to fix everything before EKOPath 5 Final. I'd bet that most of this could be cleared up by 5.5 (or certainly by 6.0)
    ----------------------------
    I'm curious if anyone else reading the forums can post benchmarks using their own codes + processor/system details.

    Quick links to some downloads
    ----------------------------
    Linux
    http://c591116.r16.cf2.rackcdn.com/e...-installer.run

    Solaris
    http://c591116.r16.cf2.rackcdn.com/e...-installer.run

    FreeBSD
    http://c591116.r16.cf2.rackcdn.com/e...-installer.run

    # Quick install line
    chmod +x ekopath-2013-02-08-installer.run ; ./ekopath-2013-02-08-installer.run --mode unattended --prefix /opt/ekopath-02-08
    ----------------------------

    EKOPath 5 is really a *BIG* difference behind the scenes. For example if you do pathcc -show hello.c # You'll notice that we're using a modified clang as part of the process. This is almost certainly what allowed us build those additional benchmarks. (To clarify a bit - we're not using the llvm backend or any llvm ir. In the past we were using a modified gnu cc1, but that and all other gnu code has been removed. EKOPath 5.5 will have a new backend we've been working on, but pushing out both of those big changes at the same time just wasn't possible. )

    ----------------------------
    Small selfish side note about ENZO (sister compiler to EKOPath that has added GPGPU support and additional features for multicore programming)
    ----------------------------
    While it's not possible speed-up every code on the GPU - We have put a huge amount of work in the programming models available for ENZO and it's backend performance. Personally, I don't get as excited (or worried) about 5-10% CPU performance when we can offer 30% gains to 10x with the GPU. I can't make promises, but we may try to drop a few OpenACC pragma around those benchmarks and post numbers on a Tesla 2050..

  3. #3
    Join Date
    Jul 2009
    Posts
    202

    Default

    Quote Originally Posted by codestr0m View Post
    ----------------------------
    Small selfish side note about ENZO (sister compiler to EKOPath that has added GPGPU support and additional features for multicore programming)
    ----------------------------
    While it's not possible speed-up every code on the GPU - We have put a huge amount of work in the programming models available for ENZO and it's backend performance. Personally, I don't get as excited (or worried) about 5-10% CPU performance when we can offer 30% gains to 10x with the GPU. I can't make promises, but we may try to drop a few OpenACC pragma around those benchmarks and post numbers on a Tesla 2050..
    So the real reason to be excited about EKOPath is automatic GPGPU usage? I am involved in some "scientific computing" but so far haven't had a reason to use anything other than GCC and clang.

  4. #4

    Default

    Quote Originally Posted by Cyborg16 View Post
    So the real reason to be excited about EKOPath is automatic GPGPU usage? I am involved in some "scientific computing" but so far haven't had a reason to use anything other than GCC and clang.
    s/EKOPath/ENZO/g
    ---------
    I'm biased, but I'd certainly recommend you test EKOPath and Intel compilers if you don't have a GPU. If you can get access to a system with a GPU (Tesla 2050, 2070 or 2090) *and* you're willing to add some pragma or directives to your code ENZO may be interesting. (The performance gains can be well worth the effort) We're working on support for -autogpu which like autovectorization or other automatic optimizations requires zero code changes. This isn't ready for production and just "noteworthy" at this point. (Honestly, give us a couple more months)

  5. #5
    Join Date
    Jun 2010
    Location
    ฿ 16LDJ6Hrd1oN3nCoFL7BypHSEYL84ca1JR
    Posts
    968

    Default

    Well, a noticeable improvement is that it does not return with an error with -march=native.

    But "-march=native" is not recognized and as all nonrecognized march parameters it activates the generic profile:
    Code:
    /usr/lib/5.0.0/x8664/ipl -VHO:rotate -LIST:source=off:notes=off -PHASE:p:i -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -show -LANG:=ansi_c -TARG:abi=n64 -TARG:processor=generic -TARG:sse=on -TARG:sse2=on -TARG:sse3=off -TARG:ssse3=off -TARG:sse4a=off -TARG:sse4_1=off -TARG:sse4_2=off -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=off -TARG:pclmul=off -TARG:3dnow=off -fB,/tmp/pathcc-B-1934caf9.B -fp,hello.o hello.c -cmds pathcc -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -TARG:abi=n64 -TARG:processor=generic -TARG:sse=on -TARG:sse2=on -TARG:sse3=off -TARG:ssse3=off -TARG:sse4a=off -TARG:sse4_1=off -TARG:sse4_2=off -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=off -TARG:pclmul=off -TARG:3dnow=off
    The correct way to autochoose the cpu is -march=auto:
    "-march=auto -Ofast"
    Code:
    /usr/lib/5.0.0/x8664/ipl -VHO:rotate -LIST:source=off:notes=off -PHASE:p:i -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -show -LANG:=ansi_c -TARG:abi=n64 -TARG:processor=pentium4 -TARG:sse=on -TARG:sse2=on -TARG:sse3=on -TARG:ssse3=on -TARG:sse4a=off -TARG:sse4_1=on -TARG:sse4_2=on -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=on -TARG:pclmul=off -TARG:3dnow=off -fB,/tmp/pathcc-B-19683af2.B -fp,hello.o hello.c -cmds pathcc -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -TARG:abi=n64 -TARG:processor=pentium4 -TARG:sse=on -TARG:sse2=on -TARG:sse3=on -TARG:ssse3=on -TARG:sse4a=off -TARG:sse4_1=on -TARG:sse4_2=on -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=on -TARG:pclmul=off -TARG:3dnow=off
    Slightly better, it builds for SSE3 but for Pentium 4?! This is a ivy bridge mobile cpu, i7 3632qm! If you could just copy & paste the cpu recognition from another compiler, that would be great.


    The installer installs the manpages to /usr/docs/man/man1/ which is not in the man search path on archlinux, but I don't know about other systems. But it seems nonstandard to me. Use "man -l /file" to open files directly with man.
    Code:
           -march=<cpu-type>
                   (For x86) Compiler will optimize code for the selected cpu type: opteron, opteron-sse3, xeon, em64t, nocona, prescott, core, core2, wolfdale, harpertown, nehalem, barcelona, shanghai, istanbul, sandy, bdver1, auto.  auto means to optimize for the host platform that the compiler is running  on.   Core  refers  to  the
                  Intel Core Microarchitecture, used by 64-bit CPUs such as Woodcrest.  The default is auto.
    It seems none of the cpu profiles, even bdver1 enable the use of avx by default. In fact it says
    Code:
    pathcc -o hello_pathcc hello.c -march=bdver1 -O3 -mavx -show
    pathcc ERROR: Target processor does not support AVX.
    I am not so proficient what exactly is supported in which cpus, but I thought bulldozer supported avx right from the beginning?

    So the closest for me would probably be using -march=sandy -Ofast and perhaps -mavx and -mpclmul.
    Unfortunately sandybridge did not support fma and xop so I can't activate it directly. Are there any real cpu specific optimizations or is it just for choosing which instructions to use (i.e. generic with all the supported stuff enabled one by one being equally good)?

    Intel's cpus don't support 3dnow but I saw that the parameter to activate 3dnow is not documented in the manpage (it's pretty clear that it's -m3dnow though. It says it's not supported for bdver1, by the way, not sure if this is right).

    The benchmark is not that good I think, because it very probably uses the generic cpu build profile (Michael is using an Ivy Bridge cpu too). It may be fair in that gcc is set to the generic build profile too but that's not really where ekopath is supposed to shine, right?
    Last edited by ChrisXY; 02-09-2013 at 02:19 PM.

  6. #6
    Join Date
    Jan 2008
    Posts
    295

    Default

    There isn't much to see out of Parallel BZIP2 Compression.
    Are you building only pbzip2 with the various compilers? pbzip2 is sort of just a front end for libbzip2, which is where the work actually happens.

  7. #7

    Default

    Quote Originally Posted by ChrisXY View Post
    Well, a noticeable improvement is that it does not return with an error with -march=native.

    But "-march=native" is not recognized and as all nonrecognized march parameters it activates the generic profile:
    Code:
    /usr/lib/5.0.0/x8664/ipl -VHO:rotate -LIST:source=off:notes=off -PHASE:p:i -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -show -LANG:=ansi_c -TARG:abi=n64 -TARG:processor=generic -TARG:sse=on -TARG:sse2=on -TARG:sse3=off -TARG:ssse3=off -TARG:sse4a=off -TARG:sse4_1=off -TARG:sse4_2=off -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=off -TARG:pclmul=off -TARG:3dnow=off -fB,/tmp/pathcc-B-1934caf9.B -fp,hello.o hello.c -cmds pathcc -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -TARG:abi=n64 -TARG:processor=generic -TARG:sse=on -TARG:sse2=on -TARG:sse3=off -TARG:ssse3=off -TARG:sse4a=off -TARG:sse4_1=off -TARG:sse4_2=off -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=off -TARG:pclmul=off -TARG:3dnow=off
    The correct way to autochoose the cpu is -march=auto:
    "-march=auto -Ofast"
    Code:
    /usr/lib/5.0.0/x8664/ipl -VHO:rotate -LIST:source=off:notes=off -PHASE:p:i -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -show -LANG:=ansi_c -TARG:abi=n64 -TARG:processor=pentium4 -TARG:sse=on -TARG:sse2=on -TARG:sse3=on -TARG:ssse3=on -TARG:sse4a=off -TARG:sse4_1=on -TARG:sse4_2=on -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=on -TARG:pclmul=off -TARG:3dnow=off -fB,/tmp/pathcc-B-19683af2.B -fp,hello.o hello.c -cmds pathcc -O3 -LANG:math_errno=off -OPT:ffast_math=ON -OPT:Ofast= -TARG:abi=n64 -TARG:processor=pentium4 -TARG:sse=on -TARG:sse2=on -TARG:sse3=on -TARG:ssse3=on -TARG:sse4a=off -TARG:sse4_1=on -TARG:sse4_2=on -TARG:avx=off -TARG:fma=off -TARG:xop=off -TARG:aes=on -TARG:pclmul=off -TARG:3dnow=off
    Slightly better, it builds for SSE3 but for Pentium 4?! This is a ivy bridge mobile cpu, i7 3632qm! If you could just copy & paste the cpu recognition from another compiler, that would be great.


    The installer installs the manpages to /usr/docs/man/man1/ which is not in the man search path on archlinux, but I don't know about other systems. But it seems nonstandard to me. Use "man -l /file" to open files directly with man.
    Code:
           -march=<cpu-type>
                   (For x86) Compiler will optimize code for the selected cpu type: opteron, opteron-sse3, xeon, em64t, nocona, prescott, core, core2, wolfdale, harpertown, nehalem, barcelona, shanghai, istanbul, sandy, bdver1, auto.  auto means to optimize for the host platform that the compiler is running  on.   Core  refers  to  the
                  Intel Core Microarchitecture, used by 64-bit CPUs such as Woodcrest.  The default is auto.
    It seems none of the cpu profiles, even bdver1 enable the use of avx by default. In fact it says
    Code:
    pathcc -o hello_pathcc hello.c -march=bdver1 -O3 -mavx -show
    pathcc ERROR: Target processor does not support AVX.
    I am not so proficient what exactly is supported in which cpus, but I thought bulldozer supported avx right from the beginning?

    So the closest for me would probably be using -march=sandy -Ofast and perhaps -mavx, -mxop, -maes, -mpclmul.
    Unfortunately sandybridge did not support fma and xop so I can't activate it directly.

    Intel's cpus don't support 3dnow but I saw that the parameter to activate 3dnow is not documented in the manpage (it's pretty clear that it's -m3dnow though. It says it's not supported for bdver1, by the way, not sure if this is right).
    The correct way to -march=auto or -march=native is to not set this at all. EKOPath/ENZO unlike other compilers automatically pick the best CPU profile for the current host system. If you need to target another system is when you should use those. (I think we may have a bug here and I'll double check)

    We switched over to CPUID instead of parsing /proc/cpuinfo - can you give some output of your /proc/cpuinfo and the processor info. /* This is one of those areas where I'd like the most feedback. */

    About AVX - With the exception of a corner case on AMD - Please tell me where it would be a performance win compared to SSE4.1/4.2. With that in mind we've disabled it by default and in fact AVX can cause performance *degradation* if not used properly. For more information on this please reference Agner's work on CPU instruction timing data.

    bdver1 doesn't support 3DNOW - that was dropped

    The closest CPU recognition we may be able to get "inspiration" from would be libav and their cpu check stuff.

    Lastly - sorry about the manpage location - Most of our users use --prefix when installing and never use a "default".

  8. #8
    Join Date
    Jun 2010
    Location
    ฿ 16LDJ6Hrd1oN3nCoFL7BypHSEYL84ca1JR
    Posts
    968

    Default

    Quote Originally Posted by codestr0m View Post
    The correct way to -march=auto or -march=native is to not set this at all. EKOPath/ENZO unlike other compilers automatically pick the best CPU profile for the current host system. If you need to target another system is when you should use those. (I think we may have a bug here and I'll double check)

    We switched over to CPUID instead of parsing /proc/cpuinfo - can you give some output of your /proc/cpuinfo and the processor info. /* This is one of those areas where I'd like the most feedback. */
    Well, if it would work I wouldn't want to set it manually.

    I have 8 of these:
    Code:
    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 58
    model name      : Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz
    stepping        : 9
    microcode       : 0x13
    cpu MHz         : 1200.000
    cache size      : 6144 KB
    physical id     : 0
    siblings        : 8
    core id         : 0
    cpu cores       : 4
    apicid          : 0
    initial apicid  : 0
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 13
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
    bogomips        : 4391.75
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 36 bits physical, 48 bits virtual
    edit: by cpu information you mean that? http://pastebin.com/4TbSia7Q

    Quote Originally Posted by codestr0m View Post
    About AVX - With the exception of a corner case on AMD - Please tell me where it would be a performance win compared to SSE4.1/4.2. With that in mind we've disabled it by default and in fact AVX can cause performance *degradation* if not used properly. For more information on this please reference Agner's work on CPU instruction timing data.
    Thanks for the explanation. I wasn't aware of that.

    Quote Originally Posted by codestr0m View Post
    bdver1 doesn't support 3DNOW - that was dropped
    Then all is good.

    Quote Originally Posted by codestr0m View Post
    Lastly - sorry about the manpage location - Most of our users use --prefix when installing and never use a "default".
    Actually I used --prefix=/usr
    It's usually in /usr/share/man/ somewhere.
    Last edited by ChrisXY; 02-09-2013 at 02:34 PM.

  9. #9

    Default

    Quote Originally Posted by ChrisXY View Post
    Well, if it would work I wouldn't want to set it manually. .
    I'll use the CPU info you provided and see if we can get both sets of bugs fixed in the driver. Give us a couple days and hopefully I remember to reply to this thread once it's fixed. Alternatively, pull another nightly in a week or few days and yell if it's not. (Squeaky wheel)

  10. #10
    Join Date
    Oct 2009
    Posts
    845

    Default

    Happy to see some info on EKOPath, been quiet since the open source announcement. As for the results, I seem to recall that EKOPath was optimized for AMD cpu's or am I mistaken?

    As for the tests, again why are there so many tests where there are no optimization levels declared, like SCIMARK for example, it's impossible to draw any worthwhile conclusions from those tests, for all we know they could be done at -O0.

    Looking at the tests where we do have an optimization setting (hence tests which are of any interest), Ekopath seems to do quite well with the exception of the BLAKEv2 test where it does horribly, and to a lesser extent the Himeno benchmark.

    Again, can Michael please fix the benchmarks so that they declare optimization level for all tests, else they are of little interest as we don't know what level is being compared. Using -O3 across the board would be the obvious choice if only one optimization level is used per benchmark (as is the case here).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •