PDA

View Full Version : GCC To Receive Automatic Parallelization Support


phoronix
03-10-2009, 07:20 PM
Phoronix: GCC To Receive Automatic Parallelization Support

IBM's Razya Ladelsky today outlined plans for providing automatic parallelization support within the GNU Compiler Collection. The Graphite Framework, which provides high-level loop optimizations based upon the polyhedral model, was merged for the forthcoming release of GCC 4.4 and it will be used eventually to provide some level of automatic parallelization support...

http://www.phoronix.com/vr.php?view=NzEzNA

rotarychainsaw
03-10-2009, 07:37 PM
What kind of time frame are we talking before my favorite distro can be compiled from the ground up with this enabled? I want this yesterday!

MùPùF
03-10-2009, 07:37 PM
Hey ! I'm not sure about it, but it seems like it will be the first compiler to support this killer-feature !

Am I wrong ?

RealNC
03-10-2009, 08:12 PM
Stuff like this makes me happy being a Gentoo user :D

FunkyRider
03-10-2009, 08:47 PM
Stuff like this makes me happy being a Gentoo user :D

What happens when a Gentoo user wants to change his CPU+Motherboard? :eek:

Saist
03-10-2009, 09:06 PM
What happens when a Gentoo user wants to change his CPU+Motherboard? :eek:

They switch to Debian

RealNC
03-10-2009, 09:20 PM
Haha, you wish! No, they simply rebuild.

PS:
No wait, they don't. Switching CPU is usually done to UPGRADE it. Future CPUs are always compatible with older ones. ;) Only downgrading can require a rebuild.

Redeeman
03-10-2009, 09:20 PM
What happens when a Gentoo user wants to change his CPU+Motherboard? :eek:
i would imagine they first open their tower, take out any pci cards etc, then disconnect cables, then loosen the screws, take the board out, replaces the backpanel, and then put the new stuff in and connect etc...

RealNC
03-10-2009, 09:22 PM
Or that :D

blagishnessosity
03-10-2009, 10:00 PM
+1 gentoo user over here too :)

Yuma
03-10-2009, 10:17 PM
Hey ! I'm not sure about it, but it seems like it will be the first compiler to support this killer-feature !

Am I wrong ?

Yes. Intel, IBM, Sun, and probably others have supported it for a while. And it's not a killer feature, I don't think it's that great personally.

smitty3268
03-10-2009, 11:14 PM
Yes. Intel, IBM, Sun, and probably others have supported it for a while. And it's not a killer feature, I don't think it's that great personally.

The way I read that mailing list message, GCC4.3 already has some very basic loop auto-parallelization code as well. This is about integrating a new model which will allow it to become much more complex and useful.

hdas
03-11-2009, 12:58 AM
What happens when a Gentoo user wants to change his CPU+Motherboard? :eek:

Me another happy Gentoo camper. And such things are non-issue unless you have over-optimized for a specific machine. For example, I have compiled my 64-bit Gentoo with cflags "march=core2" (including mcore2 in kernel), created an image and transferred it to my older athlon64x2 3800+ and it runs flawlessly. Sure it may be a bit suboptimal for my athlon, but since most of my machines are core2, I decided to go with it.

(Btw, I was expecting a few issues with advanced instruction sets like sssse3 and sse4aaaaa111 being absent from my athlon, but so far all is good. Including mplayer and stuff. Its a different experience from my first experience when I once compiled with march=pentium3 and it gave problems with a p2 class celeron. The second time, I was happily able to run march=pentium4 gentoo on all my machines ranging from my athlon, sempron to core duo and core2 duo. This time, I first used march=nocona, and when it went good, I was brave to try march=core2 on this older athlon.)

Also, as a side note, it has been mentioned many times that Gentoo is not just for ricers. I use it mainly because of its clean and elegant management. (On that note, other distros are yet to come up with something like 'eselect opengl' which can for example make switching graphics easy.) Portage is a gem like APT. Its sad that as mentioned in other forum posts too, that new portage features like updating by git and package sets get no coverage while codenames for ubuntu and fedora are debated.

And the other thing about Gentoo is its stability and system consistency. There are no missing libraries / conflicts on a good gentoo system. Sure all distros can do that, but what requires a effort in them is natural in gentoo. The amount of dev packages one needs to install when occasionally compiling even a small 3rd party package is sometimes nagging in other distros.

And one last thing. In my experience, for most packages, there is no difference in speed between debian i386 packages and super optimized gentoo packages. In fact, mostly debian feels much lighter.

Hope I didn't hijack this thread :D.

smitty3268
03-11-2009, 02:50 AM
(Btw, I was expecting a few issues with advanced instruction sets like sssse3 and sse4aaaaa111 being absent from my athlon, but so far all is good. Including mplayer and stuff.

march=core2 won't include anything past SSE2 by default. You have to add -mssse3, -mssse4.1, and -mssse4.2 to get the special instructions that won't work on your A64. I think the standard instructions on those chips are identical, unless you get into stuff like special hardware VM support.

Really, SSE2+ support is only going to affect a few applications anyway, and they've probably got special flags and optimizations setup within the ebuild or program. I'm pretty sure MPlayer, for example, contains lots of manual assembly code and detects what CPU you have at runtime and picks it's fastest paths available.

alec
03-11-2009, 03:54 AM
Premature optimization is the root of all evil.
You gentoo users don't test whether it actually gives you any gain...

grigi
03-11-2009, 04:31 AM
Overly generalised. Just about all gentoo-ers know that you should not over-optimize.

I use both Gentoo and Ubuntu, and for my dev-stations I always go for gentoo, since I don't get that library hell that is any other distribution...

mirza
03-11-2009, 04:41 AM
Does "Parallel" means code generation that utilize multiple threads (TFA talks about "better performance on multi-core systems", so I assume multiple threads must be involved here) or better detecting loops that can be computed in parallel using SIMD instructions?

If first case is true, does compiler care about cost of creating/killing threads on specific OS?

yoshi314
03-11-2009, 06:18 AM
Also, as a side note, it has been mentioned many times that Gentoo is not just for ricers. +1

a distribution that allows you to dynamically alter package dependencies (while resolving dependencies) according to your needs is not something you come across everyday. its package manager also transparently handles binary, from-source or development snapshot (from git,svn,hg,darcs,cvs, etc) packages in the same manner.

hdas
03-11-2009, 07:34 AM
march=core2 won't include anything past SSE2 by default. You have to add -mssse3, -mssse4.1, and -mssse4.2 to get the special instructions that won't work on your A64. I think the standard instructions on those chips are identical, unless you get into stuff like special hardware VM support.

Really, SSE2+ support is only going to affect a few applications anyway, and they've probably got special flags and optimizations setup within the ebuild or program. I'm pretty sure MPlayer, for example, contains lots of manual assembly code and detects what CPU you have at runtime and picks it's fastest paths available.

That was precisely the thing I was counting on, that by and large march=core2 implies generic x86_64 along with some sse stuff (just like march=pentium4 is apparently equivalent to march=i686 plus mmmx, msse, msse2). What I was unsure was to what extent. My athlon64 has upto sse3 (pni) and afaik, all core2 chips have ssse3 (at least my merom t5270 and penryn p8700 have). Anyway, looks like I am good :D. Although what bothers me is that if these optimizations play around with cache workings and if it affects performance by a lot. (For example, when choosing processor-family in kernel config, between generic x86-64 and core2, some of the parameters changed are CONFIG_X86_L1_CACHE_BYTES, CONFIG_X86_INTERNODE_CACHE_BYTES, CONFIG_X86_L1_CACHE_SHIFT. Interestingly, changing between core2 and k8, the only difference in the config is CONFIG_X86_P6_NOP.)

Fixxer_Linux
03-11-2009, 08:43 AM
What happens when a Gentoo user wants to change his CPU+Motherboard? :eek:

I have switched from a P4 to a Core2Quad and I setup both boxes with Gentoo.
The only tricky thing is to find the correct options for the kernel. Compiling a kernel with only the options you need is always a nightmare.
The rest of installation is really easy, and, with a Core2Quad, really fast...

I hope seeing soon the GCC4.4 on Gentoo, which is one of the rare distro to be able to handle that optimization, as other generic distro are compiled for generic i586...

Chewi
03-11-2009, 04:07 PM
Another Gentoo user here. I have very conservative CFLAGS so yeah, it's not just for ricers. This sounds yummy but I'm wondering whether it will really be possible to just enable it globally. It might open up a whole can of bugs on various programs?

psycho_driver
03-11-2009, 06:25 PM
gentoo is where it's really at

psycho_driver
03-11-2009, 06:31 PM
Premature optimization is the root of all evil.
You gentoo users don't test whether it actually gives you any gain...

Not true. I have a dual boot currently set up between Ubuntu 9.04 (from mid January) and an up-to-date -O3 -march=core2 compiled gentoo system built with a lot of attention to detail to USE flags.

The difference in performance I experienced in the same version of wine running Guild Wars was pretty substantial. I can't quote exact numbers since it's been a while since I've even booted into the Ubuntu system, but if anyone is dying of curiousity I'll do a check.

wswartzendruber
03-11-2009, 11:44 PM
Yet another Gentoo user here. I'm wondering if the x264 encoder won't be able to take advantage of this as it doesn't fully utilize both cores even with two threads specified.

CFLAGS="-O2 -march=native -pipe"

miles
03-12-2009, 03:19 AM
Yet another Gentoo user here. I'm wondering if the x264 encoder won't be able to take advantage of this as it doesn't fully utilize both cores even with two threads specified.

Just renice it. By default, it doesn't use much (85% maybe) but if you renice it aggressively you'll get something closer to 97-99% (on 4 cores). Also consider running two or more encodes in parallel.

Fixxer_Linux
03-12-2009, 06:37 AM
Yet another Gentoo user here. I'm wondering if the x264 encoder won't be able to take advantage of this as it doesn't fully utilize both cores even with two threads specified.

I recently converted a divx into mpg using ffmpeg (or mencoder, I don't exactly remember sorry). I tried using the -threads option of ffmpeg, just thinking about using all the cores of my C2Q.
Using "top", I saw that the 4 cores were fully used when "threads" option was set to "16".
However, with such a value, the resulting file was crappy : full of colorized squares on the border of the screen.

I think the multithreading option in ffmpeg needs more tuning. Meanwhile, you'll just have to use your Quad core as it was a single core.

Chewi
03-12-2009, 06:43 AM
From the FAQ...

3.4 Why do I see a slight quality degradation with multithreaded MPEG* encoding?

For multithreaded MPEG* encoding, the encoded slices must be independent, otherwise thread n would practically have to wait for n-1 to finish, so it's quite logical that there is a small reduction of quality. This is not a bug.

It does sound like what you saw may be a bug though.

DeepDayze
03-17-2009, 04:46 PM
They switch to Debian

This :D

Definitely a nice feature to have in the compiler. I'd also hope there's a cpu detection routine that can detect your CPU and adjust the optimizations on the fly when a program compiled with this new GCC starts up. This will then mean a program will run optimally on whatever CPU it finds itself running on, without any recompiles.

curaga
03-18-2009, 04:22 PM
I'd also hope there's a cpu detection routine that can detect your CPU and adjust the optimizations on the fly when a program compiled with this new GCC starts up. This will then mean a program will run optimally on whatever CPU it finds itself running on, without any recompiles.It'd also mean that instead of ~5kb Hello World we'd have 20, 30, 40, or more..

highlandsun
03-18-2009, 11:53 PM
The right answer is to have the specific CPU model encoded in the ELF program header and have the OS launch an object code recompiler the first time you try to execute a non-native binary. (Think about it, it's no harder than LLVM or java jit; easier because you can do whole program analysis instead of trying to discover dependencies J-I-T.)