I just tried it on my PC (i5 with 4 cores, no HT) on Linux 3.9-rc2:
Code:
$ for i in `seq 0 3`; do echo 2 -1 0 -2 | time -f "%e %U %S" taskset -a -c `expr $i % 4` ./intpoly -n0 & done
and
Code:
$ for i in `seq 0 3`; do echo 2 -1 0 -2 | time -f "%e %U %S" ./intpoly -n0 & done
And the results are basically the same. But I always get the (nearly) same execution time per process no matter if I start 1, 2, 3 or 4 processes and for more processes than cores the execution time increases almost linear to the number of new processes. 1 process -> ~3sec, 4 processes -> ~3sec, 40 processes -> ~30 sec. user time and sys time stay the same.
So I can't rebuild your original results.