Except for things which are not compressible (like PNG/JPG/GIF/already compressed data), XZ can easily beat GZIP tenfold (gzip has a 32KByte dictionary, XZ's dictionary is only limited by available RAM - Fedora uses 64MB dictionary if I'm not mistaken).
That dictionary has to be in RAM. So 64mb dictionary -> at least 64mb required to compress and decompress. Pray tell, how do you then install on a machine with 64mb ram, or even 128?
For fuck's sake. It's not hard to think how that kind of bloat becomes a show stopper.
Maybe you'd like to install N packages at the same time? That's 64mb * N.
Well, if N were 10, that'd still be only 640MB... My phone could handle it...
Keeping in mind that storage media doesn't like simultaneous reads or writes, I guess it's fair to say that you'd rarely have N to be more than 1 or 2...
That and Fedora being a bleeding edge distro, it's probably not such a bright idea to install new releases on very low end devices IMO...
Also, having smaller packages does speed up installing (either via local storage or netinstall), so the disadvantage is meaningless compared to the advantage.
BTW, the compression / decompression memory for LZMA/LZMA2 (of which Xz is just one implementation) is asymmetrical. It may require several gigs to compress but only a few hundred MB to decompress. And this is per archive with no regard for how big that archive is. So if you have 10 packages and each one is in an archive, and one of them is 512 KB, one of them is 10 MB, and one of them is 60 GB, all of them are going to occupy the same amount of resident program memory during decompression. And because dpkg can only install one package at a time, they won't be decompressing simultaneously. Have you ever watched a large update? It goes through the list "Unpacking" each package in serial (i.e. one at a time). So the amount of memory used by Xz decompression is only going to be about 64 MB total (and that memory will get allocated, then freed, then re-allocated again as each dpkg process unpacks a different archive file).
Now that you mention it, wouldn't it be a smart thing to analyze the dependencies before the update process and uncompressing/installing packages that don't affect each other in parallel?
Only if the decompression algorithm isn't using all your CPUs. From what I know of LZMA, it's technically capable of being parallelized, but it's not embarrassingly parallel, so you can't just scale it up effortlessly.
You could actually decompress all of the packages in parallel regardless of their dependencies, and start installing (in serial) at the lowest level leaf of the dependency tree that is decompressed.
Problem is, most filesystems fall down under heavy parallel load. You get a lot of context switching because the kernel tries to give each decompression process its own fair share of CPU time, and the filesystem tries to give each process its own fair share of IO time, and so on. Look at some of Michael's benchmarks for 64MB random reads/writes with 8 or 16 threads, even on processors that have that many physical threads. It goes down to a crawl.
Why? Because, assuming the individual archives have relatively low internal fragmentation, you introduce a lot more seeks into the operation when you have many processes doing reads and writes in parallel. There aren't really any good solutions that maintain >90% throughput under this type of load, without being inordinately unfair to one process and just letting that one process run the show for an extended period of time. But our modern stacks are configured for just the opposite scenario because of the demand for "responsive" desktops.
Of course, if you have an SSD, seeks are basically a no-op. Telling an SSD to seek just elicits a reply like "LOL OK" because there are no moving parts involved that need to move before the data can be retrieved. So you can get great parallel performance on an SSD.
Alternatively, if you have an insane amount of RAM (greater than 16GB), you could store all the archives in a RAM disk and decompress them from there -- in parallel. That would be "holy hell" fast, and then you could just directly copy the uncompressed data from the ramdisk to the HDD/SDD for long-term storage during package installation.
Shoot; the packages would probably already be in RAM anyway, because you just downloaded them. If only there were a way to "force" those pages to stay in memory between the network-downloading phase and the unpacking phase, so that you don't have to download the packages, write them to disk, read the compressed data from disk, then write the uncompressed data back to disk. You'd just download them straight to RAM, then write the uncompressed data to disk. That eliminates writing the compressed data to disk and then reading the compressed data from disk. But you need as much RAM as the size of the packages you're downloading.
OK, this is cool. This is cool. We're getting somewhere.
Download each package to a buffer in RAM. As they are downloading, directly wire up Xz (which is a streaming decompressor) to the buffer, so that you are decompressing in parallel with the download. Since a lossless decompressor reading from RAM is going to be many orders of magnitude faster than any internet connection, you're practically guaranteed that the Xz decompressor will be sitting there twiddling its thumbs for minutes on end, occasionally waking up to read a block of data and decode it.
As it's decoding, it's writing its results to disk. So at the same time as you're pulling the data off of the network, you could even set it up so that you mmap the network buffer itself to make it a zero copy architecture, so that the data travels: NIC -> buffer in RAM -> XZ reads data from buffer (zero copy) -> XZ writes data to disk.
Then, once Xz writes the decoded data to disk, you just "mv" (re-link) the files to the correct locations.
Installation of packages would go from taking many minutes to taking exactly as long as it takes to download the packages. And the effect would be better the slower your internet connection is.
I think I'm on to something. The only caveat is that you'd have to have enough dependency information up front to know the correct order to download the packages (because presumably you can't install packages before all their dependencies are installed, because the install scripts may depend on some other package already being there). But if you have multiple isolatable independent tree structures (for example if you install Eclipse and all of its dependencies and GIMP and all of its dependencies in the same go) you could parallelize those separate trees and use all of your CPU cores while your network downloads multiple files from the network and decodes them on the fly......
Last edited by allquixotic; 07-10-2012 at 04:16 PM.