Announcement

**Peter_Cordes** · 12 February 2009, 01:14 PM

Originally posted by kebabbert View Post

llama

I think that it is non trivial to scale big, up to big iron. I dont think that some random kernel hackers can easily do that in a few years.

Ok, having followed Linux development as a spectator for years now, I don't think of guys like Ingo Molnar or Nick Piggin as "random hackers". Just to pick a couple guys that seem to know what they're doing making scalable code. There are people paid to work full time on Linux, and some of them have time to write good code of their own, not spend all their time integrating patches from random hackers and hardware-support code.

Therefore I dont agree with you, when you say that Linux scales well, but you (and the kernel hackers) only have experience of machines up to 4 CPUs. How can you or the Linux people claim that, without knowing? How many Linux kernel hackers have experience of big iron?

I linked an article to XFS tuning on a 24CPU ia64, in 2006. So there are a few with access to big iron. But point taken, there might be obvious problems that would show up with some workloads on big iron, but nobody's identified them. I've seen several threads on the Linux mailing list (lkml) where people have done microbenchmarks on 8 core machines that might find some of the same scalability problems as running normal workloads on bigger machines. Or maybe they already found the problem on big iron, and have created a microbenchmark as a way to test improvements on the kind of machine that many devs have access to.

Like I said, I can't flat out claim that Linux scales well. I would bet that Linux would not be completely embarrased on medium-to-large machine. But I could be wrong about that. Certainly some workloads will expose weaknesses. Linux's scalability is definitely not mature or polished!

If I had to set up a big-iron server (> 16 cores) for something, I would try Linux with the workload I wanted to run, but I'd also try OpenSolaris (esp. in light of finding out that I might not hate its package manager

) On an 8cpu machine, I'd just use Linux, and be pretty confident that it wasn't going to do badly compared to anything else.

[quote]
As I said, I dont agree with people saying "Linux scaling well" if there are different Linux kernels for large clusters and for normal desktop PCs. Then you could also switch between FreeBSD kernels for one specifik task, and to Linux kernels for doing another task. As you do now, when switching between different Linux kernels for different tasks. That is clearly not "scalability", but rather "flexibility". Otherwise, what would Linux people call Solaris' ability to run the very same binaries on laptops to big iron? True Solaris scalability vs False Linux scalability?

Regarding Linux server vs non server. There is only one version of Solaris.

As I said, the differences between the -server and -generic Ubuntu AMD64 kernels are pretty minimal. Either binary would do well in either role. I probably made it sound like a bigger deal by going overboard with every last detail. The ia32 kernels differ more, because the -generic one supports older CPUs. Not just smaller systems, but CPUs without important parts of the instruction set. (e.g. cmov was only introduced with the P6 core. One big advantage of amd64 is that all the insn set extensions that predate it aren't optional anymore. Binaries don't need fallback code for old CPUs.)

Solaris still does have a 32bit kernel for x86 Solaris, right? (BTW, on OpenSolaris, is /bin/ls a 64 bit binary? It's 32bit on Solaris 10, and it seemed like there were 64bit versions only of the things that needed to be 64bit to talk to the kernel. Plus a very few libraries so you could compile 64bit binaries.)

And it can function in both roles. True scalability again. But if you want, you change Solaris scheduler on the fly, during run time. Is that possible with Linux, or do you have to use a special esoteric Linux version to allow that? Or do you have to recompile the kernel?

Linux had this at one point (http://kerneltrap.org/node/3366), but I think the vanilla kernel only includes on recommended scheduler, so there's nothing to choose between, and /proc/sys/kernel/cpusched/ doesn't exist on my 2.6.28 system. There are some scheduler compile-time options that let you leave out SMT or multicore awareness from the scheduler if you're building a kernel specifically for a machine that doesn't have one or the other of those. (Obviously distro kernels include both those options.)

A more recent scheduler-selection thing is from Jan 2008: http://www.ibm.com/developerworks/li...cfs/index.html With that code, you could build in multiple schedulers, but you can only select one at boot time, so you need to reboot. Which would make tuning take longer, but is ok once you have your server set up.

You can change I/O schedulers at runtime, per disk, though. (cfq, deadline, anticipatory, or no-op)

I still don't agree that having a single binary is so important. On smaller systems, you can afford to turn on more debugging stuff that adds tiny amounts of overhead in critical sections, and that's what Ubuntu's kernels do (even the -server ones). You could probably improve scalability a little by disabling some of the statistics-gathering and stuff that produces better debugging output when there is a problem. Since on big iron there will be more contention, so leaving debug checks out of critical sections will help more.

Is this single binary stuff about reliability and warranties? i.e. you get support only if you're using the distro kernel? I wonder how e.g. Canonical's or RedHat's support contracts work, if you could ask them to build you a kernel compiled for your big-iron server if you wanted to change some of the things that were only compile-time configurable...

But anyway, Linux as a source base can scale _way_ down to embedded systems. Linux can leave out e.g. the ability to swap, so you can really strip down the kernel. The Linux philosophy has never revolved around a single universal binary. (Although that hasn't stopped distros like RHEL (enterprise) from acting that way about the Linux binary they ship.)

That said, almost all of Linux's tunables these days are run time, not compile time. The compile-time options are integer variables, but rather chunks of code to leave in or out.

Some people think that GPL is quite egocentric license.

That's a good way of putting it. I like the GPL, and the philosophy behind it, and I'd agree with calling it egocentric.

**glasen** · 12 February 2009, 02:32 PM

Just one question?

Why don't you use the pre-packaged programs delivered with OpenSolaris and Ubuntu (e.g. lame or oggenc) for the benchmark instead of compiling them? This would be a much fairer comparison.

I've made my own test-profiles based on the original PTS-profiles to do a compare of several Linux distributions on my notebook :

TAR-File with custom test-suites

**kebabbert** · 12 February 2009, 02:59 PM

Dont get me wrong on this, but I like Linux. Actually, I started out with Linux at home way back. Then Linux was a desktop OS, not a serverOS. It was fun and I learned much. But I dont really like when the Linux camp goes overly aggressive. But that is maybe because of Linus Torvalds nature, he is that kind of person. He called OpenBSD developers, focusing on security, a "bunch of masturbating monkeys". Stallman said "I am not the one calling an OS for Stallmanix" implying that Linus has a big ego. Linus, in his defence, said that someone hosted his project and named the project as "Linux". You know, if I had created a project, and someone else named it, I would be mad. It is my project, and I am the one who decides. Therefore I dont really believe Linus' explanation. He decides on his project.

And I also have a problem with Stallman fighting to release his GNU for decades, and then suddenly a finnish teenager who can hardly program comes in and writes the last missing bit, the kernel. Stallman had his vision for decades. And people believes Linus did everything and worse, Linus dont do anything to give Stallman the credit he deserves. Linus objects against the use of "GNU/Linux", it should only be called "Linux" he thinks. To me it sounds reasonable, GNU/Solaris, GNU/Hurd, GNU/Linux makes it easy to understand. It seems to me that Linus has some kind of ego problem.

What would I think if I had struggled with a huge project for decades and it was 90% finished, and someone else jumps in and does the missing 10% and lets everyone think that he has done it all? Stallman's vision with GNU was a Unix like OS, but free and open source under GPL. Linux is an Unix like OS, free and open under GPL. It sounds suspiciously like Stallman's vision? And why is it Linus has got lots of awards and honorary doctorate degrees? Stallman nothing?

It is Linus that gets all the cred. The same Linus that introduced himself to the audience of a Linux conference with saying "I am your God".

If I was Linus, I would say "I only filled in this missing part, this is Stallman's work. And if you really persist, I would call it Stallman/Linux or something similar. Only calling it Linux is not something that makes me happy. I would be ashamed". In academics, if you are listed as a coauthor or not, is everything. To delete a name is death, in academics. GNU/linux vs Linux - is really really important in academics.

So because I have problems with Linus (but not Linux) I turned to Solaris. And, it turned out that OpenSolaris is very similar to Linux. Maybe because of Ian Murdock is the main architect behind OpenSolaris. OpenSolaris is becoming more and more Linux like for each release. If Linus changes his attitude, then I just switch back to Linux from OpenSolaris. No knowledge is lost, they are both very similar. I dont have to relearn anything.

Another thing with Linux is, that to me, Linux is not really cutting edge. Linux is only trying to catch up where an ordinary Unix has been for decades, there is no really new hot technology going on in Linux. No cool innovation. Solaris has the new and unique ZFS and DTrace, Zones, SMF, etc. Linux is struggling with scalability beyond 8 CPUs and struggling with stability, etc - things that Unix had problems with decades ago. Linux is just copying things. Always lags behind in technology. First there was ZFS, and then Linux camp makes similar filesystems. Then there was DTrace, and the linux camp makes similar technology, etc. Linux is a follower. Not a leader going into the front, breaking new barriers. Solaris is a leader. Creating new revolutionary technology that has never been available before. If there was no ZFS, Linux would still focus on ext4, ext5 etc. But now ZFS has showed the way, and Linux follows. At the same time, Linux camp yells that Solaris should die (but first SUN should release ZFS and DTrace under the true license - GPL. Afterwards Solaris can die). I bet we will see new cool tech from Solaris in the future also. Things that Linux will copy.

Solaris breaks new ground, creates new cool unique tech. Linux follows.

**Michael** · 12 February 2009, 03:00 PM

Originally posted by glasen View Post

Why don't you use the pre-packaged programs delivered with OpenSolaris and Ubuntu (e.g. lame or oggenc) for the benchmark instead of compiling them? This would be a much fairer comparison.

Because you don't know what different compile-time optimizations, patches, or other changes that could have been made by the package maintainer when building the packages for the tests being used. Not to mention what's available in OpenSolaris 2008.11 and Ubuntu 8.10 are slightly different (or in some cases, could be quite significant) versions.

**etacarinae** · 12 February 2009, 03:30 PM

True, but by your exact argument, you should be compiling your own gcc version and compile everything else with it, otherwise you're comparing the packaged gcc with default compiler options on OS A vs the packaged gcc with default compiler options on OS B, which, as is the case of OpenSolaris vs Ubuntu/Whatever are completely different (and gcc isn't even widely used by the OpenSolaris community). Moreover the level of support of gcc for OpenSolaris is completely different than that for Linux (there are numerous bugs that kill all the sse, sse2 optimizations with recent version of gcc on OpenSolaris).

Originally posted by Michael View Post

Because you don't know what different compile-time optimizations, patches, or other changes that could have been made by the package maintainer when building the packages for the tests being used. Not to mention what's available in OpenSolaris 2008.11 and Ubuntu 8.10 are slightly different (or in some cases, could be quite significant) versions.

**poofyyoda** · 12 February 2009, 04:13 PM

@kebabbert

I can only assume you can't understand the concept of supercomputers - they have lots and lots and lots of cpus.
Now of the top 500 fastest "advertised" computers in the world 439 use linux. http://www.top500.org/stats/list/32/osfam

Now if linux can't scale properly I suppose that those multi-billion dollar computers are wasted? Is that what your trying to say?

Even the top one uses linux, http://www.top500.org/system/details/9707, with Opteron processors BTW and with 129600 cores. I suppose that linux has bad scaling huh?

And as you can see here only ONE of the top 500 uses openSolaris.

So stop spreading your bullshit FUD, that linux scales poorly and that Solaris is the uber OS, because all indications contradict your misguided beliefs.

**kebabbert** · 12 February 2009, 05:58 PM

poofyyoda
Maybe you didnt read my earlier posts. But I can recapitulate them here, just for you!

1. Linux for large clusters are using a modified stripped kernel. It is a non std kernel. As I wrote earlier "Do you think Google's Linux kernel has drivers for web cams?"

2. Standard Linux kernel has problems scaling beyond 8 CPUs. As you can see if you read my earlier posts. People benchmark 4-8 CPUs, they dont have access to big iron for millions of USD with hundreds of CPUs and many more threads. Linux is foremost a desktopOS, not a serverOS. Linus started developing Linux as a desktopOS, only (quite) recently it has ventured into true serverOS territory. It takes decades to get a really good serverOS. See Windows' attempt to become a serverOS - it takes long time. For ordinary OS usage, std Linux kernel doesnt scale beyond ~8 CPUs.

3. Of course you could tailor Solaris to do one specific task; number crunching if you wanted to, too. No problem. Solaris is arguably the best scaling OS. Linux is easier to modify than Solaris. If you are going to do number crunching and nothing else, do you rather modify a complex mature kernel, full of intricacies such as Solaris, or do you modify a simple kernel such as Linux?

4. If you use modified and altered kernels for different tasks, it is not "scalability" but rather "flexibility". Solaris has one install DVD for small desktops up to big iron with hundreds of CPUs. THAT is scalability. The code is not altered between the different configurations. Linux can't handle all these different configurations. You have to rebuild and modify the kernel. Ergo, it is not scaling. Otherwise you wouldnt need to modify anything.

**Peter_Cordes** · 12 February 2009, 06:57 PM

Originally posted by Michael View Post

Because you don't know what different compile-time optimizations, patches, or other changes that could have been made by the package maintainer when building the packages for the tests being used. Not to mention what's available in OpenSolaris 2008.11 and Ubuntu 8.10 are slightly different (or in some cases, could be quite significant) versions.

If you're trying to compare how fast things will be when an typical user encodes an ogg, you should use the version of oggenc that the OS ships. Most people don't compile their own oggenc, and if they do, they'd take the time to use the best compiler, too. IMO, the only useful comparisons are:

1. distro-supplied binaries (if the vendor provides them. Even when they're different versions, unless they're so out of date that most people install their own. I'm looking at you, non-Open Solaris...)
or
2. versions compiled from identical source with the best compiler and options you can find (after at least a bit of experimentation with compilers and options) for that platform. This represents the performance a user could achieve if this one app was what they needed the whole machine to do quickly. e.g. how many simultaneous ogg streams can the machine encode in real time. (Which isn't the same thing as how fast a single wav can be compressed, but use your imagination to think of cases where that might be what you needed to tune for.)

Part of the advantage of one distro over another is that one can include patched/tweaked versions of things. Having a nice version of apache, for example, is a valid selling point. Err, if you still want to call it selling when we're talking about Free software!

e.g. back in the days when Pentiums (586) were current, Mandrake made a name for themselves by being RedHat compiled with -march=586. Before they diverged from RH (and this was before Fedora existed), mdk's main selling point was being compiled with good options! So don't discount that.

One reason I hate ia32 so much is that distros (e.g. Debian) compile everything but the kernel with backwards compat for 486. (-mtune=686 -march=486 or something) So unless progs have runtime cpu detection (with the overhead that entails), it won't be using cmov, SSE floating point, or anything 486 didn't have. It was only a couple years ago that Debian made the momentus decision to drop support for i386 from non-kernel packages (without bswap, for one thing). So just use Gentoo and compile it all yourself? Yeah, but if you're on a machine old enough that you can't just run in amd64 mode, compiling will be slow. Catch 22.

edit: I had some second thoughts here. If you think about the stuff you build from source as representative of other things you might build from source, then it is useful to benchmark your own compile (using the recommended compiler and options for that platform, e.g. Sun's cc -fast -xarch=native64 -xvector=simd) of things that the distro already provides binaries for. Then you can look at the numbers and get an idea how well the distro might do running different code that you were planning to build from source. e.g. maybe you can see that OpenSolaris ships with a compiler good at optimizing floating-point math.

**kebabbert** · 12 February 2009, 07:38 PM

llama,

Thats a good point and in a way I agree with you. Most people just use the std binary that comes prebundled and dont compile themselves. But if we accept this, then we can not use these benchmarks as evidence for Linux being faster/slower than Solaris. What these benchmarks tell us, is who has done a good compile, and who has not. It boils down to the old question: should we use the default settings, or should we optimize for each OS? But at the same time, its resonable to demand 64bits binaries and a fairly recent compiler version for all tested OS? Maybe not doing heavy optimization that requires expert knowledge, but just the same basic settings.

**Peter_Cordes** · 12 February 2009, 08:09 PM

Originally posted by poofyyoda View Post

@kebabbert

I can only assume you can't understand the concept of supercomputers - they have lots and lots and lots of cpus.

So stop spreading your bullshit FUD, that linux scales poorly and that Solaris is the uber OS, because all indications contradict your misguided beliefs.

For the love of cheese, what did I just say? Oh yeah:

Originally posted by llama

People are people. The not-so-clever ones who don't know one or more of statistics, computer architecture, operating system design, or basic Unix, sometimes still use Linux and spout off about it.

We're not talking about clusters. We're talking about single-system-image big iron, where _one_ kernel runs on a single machine with > 16 CPUs in a cache-coherent shared-memory system. The most cost-effective machines for cluster-building, in CPU power per dollar, are dual-socket quad core Intel Core2-based machines. i.e. 8 cores per node. That's great if you have a workload that has some coarse-grained parallelism, or is embarrassingly parallel, e.g. processing 100 separate data sets with single-thread processes that don't depend on each other. That's not so great if you have a lot of processes that need fine-grained access to the same shared resource. The canonical example here is a database server handling a database with a significant amount of write accesses. Otherwise you could just replicate it to a big cluster and spread the read load around. Locking for write access in a big cluster, even with low latency interconnects like infiniband, is still _way_ higher overhead than you'd get in e.g. a 4 or 8 socket quad-core machine. Even NUMA big iron is better suited for this than a cluster.

CLUSTERS DON'T COUNT AS BIG IRON. They're just a pile of normal machines. They do have their uses, though.

kebabbert and I disagree about the value of having a single binary, instead of a single source base, that's scalable. If you built a Linux kernel that was configured as well as possible (with config options, not patches) for big iron, it would still work just fine on a small machine. It might waste some RAM on the small machine, but if the small machine is SMP, there's not a lot you could have left out that would make it much more efficient on the small machine.

And he's more doubtful than I am that Linux is any good at all on e.g. a 16 or 32 core machine. I wouldn't be at all surprised if Linux gets its ass handed to it by Solaris on a 256 core machine, though. So that's probably just an issue of us not being clear on what we're trying to say about what size of machine we mean every time we say big iron in a different sentence.

I also wouldn't be surprised if Linux is faster on small machines, even without customizing a kernel by leaving out support for things your machine doesn't have. Well, obviously this is going to depend on exactly what workload you're measuring, since the OS isn't a factor at all in a lot of CPU-bound programs. This is probably ancient FUD, but I seem to recall that Linux was faster at fork() than Solaris was. Solaris was good at lightweight thread creation, though, and that seemed to be the prefered way to do things, instead of select() or poll() or async I/O. Yes, this is Solaris again, not openSolaris.

BTW, I'm even more interested by openSolaris after hearing that Ian Murdock was one of the people behind it. Being a big fan of Debian GNU/Linux, that's definitely a good sign. I'd forgotten about him going to Sun.

And yes, I almost always say GNU/Linux when I mean GNU/Linux. Every time I've said "Linux" in this thread, I was specifically talking about the Linux kernel. Usually I end up saying GNU/Linux at least a few times, but this thread has been all about kernel scalability. Or I just said Ubuntu when I mean the kernel+userspace, instead of Ubuntu GNU/Linux. I totally agree with everything you said about credit to Stallman and GNU, and that there will at some point be GNU/Solaris, and so on. There's already Debian GNU/FreeBSD. Maybe called GNU/kFreeBSD, IIRC.

Originally posted by kebabbert

4. If you use modified and altered kernels for different tasks, it is not "scalability" but rather "flexibility". Solaris has one install DVD for small desktops up to big iron with hundreds of CPUs. THAT is scalability. The code is not altered between the different configurations. Linux can't handle all these different configurations. You have to rebuild and modify the kernel. Ergo, it is not scaling. Otherwise you wouldnt need to modify anything.

If you want to talk about scalabity as a range of supported machine sizes, not just how high you can go on the big iron end, Linux will give Solaris a run for its money. Unless I missed some news about openSolaris being good for embedded systems.

Perhaps one of the reasons GNU/Linux distros don't build a kernel for maximum scalability, but instead build kernels that will run well on extremely common hardware, is that it's very easy to build a new kernel. And/or maybe there's not a lot of scalability to be gained by rebuilding. If you want to run on a machine with < 128MB of RAM, it might make sense to have a custom kernel in an effort to use less memory, but in terms of scaling to a slower single CPU machine without extreme memory constraints, AFAIK there's not much to be gained with a custom kernel.

I feel like I have to defend Linux from you seizing on what I said about it being possible to compile a custom kernel for a machine, as if that was necessary for even adequate performance. It's something you could do to get the last few percent out of a machine. I don't have any clever ideas for a way to test this that would come up with any single number, since there's too many different workloads to pick, and too hard to benchmark desktop responsiveness (which is what I care about on older small machines like my P4 laptop).

I think it's fair to say that Linux scales ok with a single binary, but you could maybe push the edges of its scalability out a bit at either end by customizing kernels for small uniprocessor or big iron systems. Actually, leaving out debug, trace, and statistics code will improve scalability at either end, but that's not what the distros do, because they need useful crash reports. You can get slightly better uniprocessor performance by compiling a kernel specially for it, but it's marginal because modern kernels patch the x86 lock prefix to a nop in all their spinlocks on uniprocessor machines. (custom compiled kernels would leave out the counter increment/decrement entirely, which might help more on a machine without a good out-of-order engine, like a single-core Atom. Or do they have hyperthreading. oh yeah, forget that then.)

Announcement

AMD Shanghai Opteron: Linux vs. OpenSolaris Benchmarks

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment