[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Compressing release packages better
From: |
Lasse Collin |
Subject: |
Re: Compressing release packages better |
Date: |
Thu, 9 Jan 2025 11:41:17 +0200 |
On 2025-01-08 Bruno Haible wrote:
> Thank you for the suggestions. I never had paid much attention to it
> (because today's networks and disks cope well with large files: when
> people download a 1.5 hours movie that's already 1 GB or 2 GB).
20 years ago connection speeds were slow and it didn't matter if
decompression wasn't very fast. Smaller file size saved more time than
slower decompression increased it. With fast connections the balance is
different (it's one reason why zstd can be a great choice). I suppose
small file sizes still save a bit of bandwidth on GNU servers.
> Note that is package-specific. For instance, I think the *.po files
> sort more efficiently in the by-dirname order, but in the gettext
> tarball the copied files are more dominant.
Yes indeed. Basename sort might help with some other packages but I
expect it to be worse quite often.
Before sending the basename sorting suggestion, I tried a few other
methods like sorting by the last two path elements, which groups all po
directories together. The basename sort likely isn't the best method for
gettext but the differences are too tiny to matter. The simple method
is good enough.
In po directories, I suppose it's good to keep xx.po and xx.gmo
adjacent since they share the strings. For more typical packages, tar's
--sort=name can be fine enough and makes the file order reproducible.
> Decompression memory requirements still matter, though. For example:
> - Embedded Linux systems often have only 256 MB of RAM.
> - I occasionally use a laptop with 1 GiB of RAM, or a smartphone
> with 2.75 GiB of RAM.
> - In cloud environments, the price is proportional to the RAM size.
> Therefore it is not unusual to work with VMs with 0.5 GiB of RAM.
These are useful examples, thanks!
> > Bonus: If one uses the long --lzma2 option, appending ",pb=0" helps
> > a *tiny* amount (like 0.2 % to 0.6 %) with ASCII/UTF-8 text
> > (including source code tarballs) without downsides (apart from
> > making the command line uglier). Example:
> >
> > xz -T1 --lzma2=preset=9e,pb=0
>
> I try to avoid options here that few people use, so as to minimize the
> risk of running into trouble. Reducing the .xz size from 11 MB to 8 MB
> is good enough; I don't need further tuning if it comes with some
> risks.
I wrote it as a "bonus" because the improvement is so tiny that it might
not be worth the longer command line. :-) The options are supported
since the first stable release (5.0.0) so there's no risk in that sense.
Sometimes even the -e is not worth the extra compression time but at
least with gettext it does make a noticeable difference.
I suggest using the -T1 option when compressing gettext since xz's
threading doesn't speed it up much with -8e: for 151 MiB input it's less
than two threads on average (if you don't tweak the settings further).
With -9e, a second thread wouldn't start until the input size exceeds
192 MiB and if it some day does, it will hurt compression. The -T1 or
--threads=1 is supported in xz 5.0.0 already for forward compatibility.
Enabling the threaded mode by default in xz 5.6.x was a tricky decision
due to things like this. People wished for it though because then big
files can be decompressed in threaded mode too. (Files compressed in
single-threaded mode only decompress with one thread.) Apparently many
weren't specifying the --threads option in situations when it clearly
would have been useful. But now there is the opposite problem where one
sometimes needs to add -T1.
Having said all that, using plain "xz -8e" or "xz -9e" for gettext is
perfectly fine if you prefer to keep it very simple. :-) I know I'm
biased to over-thinking this.
--
Lasse Collin