I've been working on a highly threaded, high throughput processing chain for working with large format digital imagery. The performance numbers that everyone likes to look at are the ones where input imagery is geometrically warped into an image mosaic.
For development we've been running dual systems, one with linux, one with windows (both 64bit). The initial systems were dual clovertowns (2 sockets, 8 cores at 1.6GHz). I swapped them out for dual core i7's (2 sockets, 16 virtual cores at 2.4Ghz).
With the c2q systems typical thoughput was within ~15-20% between linux and server 2008. So I didn't bother so much. Updating to the core i7 with hyperthreading on, a huge disparity appeared where the linux throughput is 3x higher than server 2008.
This particular process uses 2 sets of threads in a 4 stage pipeline (1 stage input, 2 balanced stages in the middle and one stage output).
I've run a series of benchmarks with changing the number of total processing threads. Also I took the linux hard drive, put it inside the windows core i7 box and replicated some of the same results (so hardware is *not* an issue here).