Nope, this means that clang compiled single thread perfomance equals 2-4 gcc threads.
Nope: You cannot distinguish between a weakly parallelizable algorithm and compiler performance. To get good data, you would also have to provide a GCC run without openmp: That would show the speedup due to OpenMP.
(Sorry I didn't read too closely if it already does this, and didn't look in the source to check). Anyway one thought is "is this using gcc -march=native"? and another might be to use gcc's profile guided optimization for it. And/or clang's equivalent if it exists.