guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Profiling of man-db database generation with zlib vs zstd


From: Ludovic Courtès
Subject: Re: Profiling of man-db database generation with zlib vs zstd
Date: Tue, 29 Mar 2022 12:30:14 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hi!

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> You'll need to generate the tar.zst and tar.gz yourself, but the script
> that was used is:
>
> ;; decompress-zstd.scm
> (use-modules (ice-9 binary-ports)
>              (ice-9 match)
>              (statprof)
>              (zstd))
>
> (define MiB (expt 2 20))
> (define input-file "/tmp/chromium-98.0.4758.102.tar.zst")
> (define output-file "/dev/null")
>
> (define (decompression-test)
>   (call-with-input-file input-file
>     (lambda (port)
>       (call-with-zstd-input-port port
>         (lambda (input)
>           (call-with-output-file output-file
>             (lambda (output)
>               (let loop ((bv (get-bytevector-n input (* 4 MiB))))
>                 (match bv
>                   ((? eof-object?)
>                    #t)
>                   (bv
>                    (put-bytevector output bv)
>                    (loop (get-bytevector-n input (* 4 MiB)))))))))))))

To isolate the problem, you could allocate the 4 MiB buffer outside of
the loop and use ‘get-bytevector-n!’, and also remove code that writes
to ‘output’.

> This confirms that guile-zstd is not noticeably faster than guile-zlib,
> which is unexpected.

Uh, surprising.

Note that ‘statprof’ incurs overhead, so in general if you want timings,
get them without ‘statprof’.

> Compare to the command line tools:
>
> $ time+ zstd -cdk /tmp/chromium-98.0.4758.102.tar.zst > /dev/null
> cpu: 99%, mem: 10548 KiB, wall: 0:09.37, sys: 0.30, usr: 9.05
>
> $ time+ gunzip -ck /tmp/chromium-98.0.4758.102.tar.gz > /dev/null
> cpu: 99%, mem: 2908 KiB, wall: 0:22.29, sys: 0.31, usr: 21.98
>
> where zstd is about 2.3x faster.
>
> It's unfortunate that the bulk of the time is spent in "anon" (anonymous
> proc?), which doesn't say much.

It’s likely one of the lambdas.

> Perhaps I should open an issue with the guile-zstd project.

Yes, or we can continue here.  :-)

>From there I think we should first fully isolate the thing we’re
measuring, as discussed above, to gain confidence.

It the code using guile-zstd is slower than the CLI, then it could be
that guile-zstd doesn’t initialize the library properly, or that it gets
buffering wrong or something.

I’ll see if I can give it a try too.

Thanks for investigating!

Ludo’.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]