I also use -march=my-native-arch -O2 (and -pipe) (I use -Os on VIA CPU based systems), earlier I told the compiler expilcitely to use -mmmx -msse -m3dnow etc. since older compilers didn't do that automativally. I think the comparison of bins for x86 general and -march-things is not easy. It depends a lot on the compiler that is used, e.g. intel's own compilers tend to generate fastets code on their own CPU (oh, what a news) but the code then also tends to be huge.
If we all go by gcc, which should be most people's default, it probably depends if the binary distribution is compiled for i585, i686 or just i386 and compatible. I guess a plain works-everywhere i386 will be noticeably slower than a -march=your-cpu-type but comparing with an i686... well.
There was an article in a recent issue of German Linux user comparing intel's i7 (ist it called i7 or core 7 someting?) and AMD PhenomII. There were interesting results in the comparison concerning 32bit and 64bit machines code, AMD yielded a lot more in certain applications. Of course over all your system's packages the gain won't be that much (AMD also declared this on German Chemnitzer Linux Days 2 or 3 years ago) but cartain things will have a nice speedup while very few might even drop below x86_32 level in performance (I guess it was rar packager).
(FYI the AMD system won that test. Bwahaha, look at my signature. Actually the intel was as expected the winner by brutal performance but sucked up more power and... tada! it was for about 1000$ while the Phenom2 was for about 180$ so make up your own opinion about the two. Besides there's also a good part of chipset and RAM performance difference to calculate in, which will influence the measurements.)
Since I go with Gentoo I always use -march=something and thus won't have comparison but on Gentoo you also have the nice USE flags and so on.
But I was told by a guest on Chemnitzer Linux weekend that he was really surprised how fast my KDE (3.5.8 back at that time) would start up, and that was on a lousy VIA C3-2 1200MHz, 512M RAM (DDR1 iirc). So it had probably helped.
Besides all that I want to warn you people about too aggressive optimizations, and there are some packages in Gentoo which have custom flags disabled by default. Furthermore e.g. wine didn't like -Os to be compiled with so I switch to -O2 for this package on my VIA systems.
Stop TCPA, stupid software patents and corrupt politicians!