In some cases it also leads to slower performance.
True, I noticed this when doing some Mame benchmarks here on the Phoronix forums, someone told me that -O2 often beat -O3, and while it didn't happen often the time it did was the one where it made the largest difference.

Obviously this is due to how complex it is to make correct heuristics regarding those more advanced optimizations. This can be rectified by using PGO (profile-guided optimization) where the compiler gathers all necessary runtime data during an initial run which is then used to correctly determine when and when not to use certain optimizations in order to maximize performance.

When using PGO I've never encountered a situation where -O3 hasn't beaten -O2, the downside of course is that compilation becomes more complex since you need to perform a information gathering execution of the code in between two compiles.