lzham offers much faster decompression than lzma (9 vs. 36 seconds in enwik9), at a very modest increase in archive size on average. It is even faster than gzip in decompression.
Decompression speed is indeed important, but so is compression ratio. So any choice in this matter will likely weigh in all factors and I guess that for package distribution, compression speed is the least important factor as it is not something which affects end users. Compression on the other hand affects the time it takes to recieve the packages and decompression speed affects the time it takes to install them once they are downloaded.
So a solution which provides great compression and great decompression speed is likely a prime candidate. On my machines the packages I get from the Arch repos (xz compressed) unpack and install very quickly but then again I have a core i5 and a core i7 so it's hard for me to judge how effective it is overall.
Still, lzma should have proved itself as striking a good balance between compression/decompression speed and compression ratio given that it is used in so many compression tools.
lzham offers much faster decompression than lzma (9 vs. 36 seconds in enwik9), at a very modest increase in archive size on average. It is even faster than gzip in decompression.
Well lzma (as in the utils, not the overall compression algorithm) is no longer being developed in favour of xz, so any worthwhile comparison should be made against xz utils rather than lzma utils as I assume there has been improvements made since the development switch to xz.
Yes, I was referring to the lzma algorithm, not the software package with the same name.
The lzham algorithm itself is correct. The implementation could contain bugs: as lzham is not in widespread use, one might be less confident in the code. But even if you insert an extra checksumming and verification step, you would still be ahead of xz(lzma).
There's your problem. Choosing a solution for something with such a widespread use, it has to be quasi-ubiquitous and rock solid in the first place. Who knows, a few years down the road maybe that algorithm will prove itself, see wide spread adoption and replace xz. But it has to happen in that order.
Hmm. That's not the way I remember the discussion at UDS-O. I think I found the right Brainstorm link. But I don't see it as being rejected. What I found was marked "Being Implemented"... although obviously it stalled after that.
What I do remember is that it was a blueprint for Oneiric but ended up getting blocked/postponed during that development cycle. I'm not sure how intrusive it really would be, but I suspect it was then a little too radical for an LTS. After that the interest seemed to die out. But I'm not positive what the whole story was. In any case, there wasn't a discussion at UDS-Q.
I remember the UDS debdelta session a few cycles ago. It sounded a little ambitious (not nerly as easy as many seem to think) but doable. It is a bummer that it didn't end up landing. So far it's not on the schedule for UDS-R but perhaps it will be brought up in the dpkg-xz session in couple weeks.
That benchmark is not very useful on its own, as it only tests compression of text files (and it's a text file with very specific characteristics too), and they only test with the highest available compression ratio. Most .deb packages don't consist of XML dumps of Wikipediaand it's unlikely that they use the currently used compressors at their highest compression ratio (because often that affects the speed or memory use too much).
Edit: note that I'm not saying that lzham does badly on binaries, just that that page is not useful as a test for .deb compression (unless it's a .deb that contains mostly text files maybe).
Last edited by JanC; 10-18-2012 at 01:40 PM.
Be aware that I only referred to decompression speed, not compression ratio. For meaningful results on compression ratio, a more diverse benchmark would indeed be necessary.
One can however say with some confidence that the relative decompression speed will not change on different types of data.