Same test (pts 0.5.0), Phenom 9600 (2.3GHz), 4 cores, gcc 4.3.0:
Code:
Parallel BZIP2 v1.0.2 - by: Jeff Gilchrist [http://compression.ca]
[July 25, 2007] (uses libbzip2 by Julian Seward)
# CPUs: 4
BWT Block Size: 500k
File Block Size: 900k
-------------------------------------------
File #: 1 of 1
Input Name: bigfile
Output Name: bigfile.bz2
Input Size: 691505952 bytes
Compressing data...
Output Size: 425971060 bytes
-------------------------------------------
Wall Clock: 38.688440 seconds
38.7s versus 57s for kte's Phenom 9850 @2.7GHz and 69.8s for khurios' Phenom 9500 @2.2GHz. Maybe his hard drive was the bottleneck, maybe his RAM was set in ganged mode, maybe the TLB bug patch was enabled, I don't know. GCC 4.3.x also provides performance gains with recent processors such as the Phenom and the Core 2 Duo/Quad CPUs.
But while 20.6s for your C2Q @3.2GHz is nothing to sneeze at, I've seen very different results for more comparable CPUs such as the Q6600. It's hard to find comparable data because the benchmark file changed so often - it'll be easier once pts 1.0 comes out with a definitive file.
Anyway, I don't have any experience with SMP systems with more than 4 cores. I just read that while AMD has had very little success on the consumer PC front, it currently has an edge in the server and HPC markets. http://www.anandtech.com/weblog/showpost.aspx?i=443
I guess what's important is to clearly identify the target usage, and determine which offering would perform better in the relevant scenarios. The OP hasn't stated what those would be.