gwl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Processing large amounts of files


From: Ricardo Wurmus
Subject: Re: Processing large amounts of files
Date: Thu, 21 Mar 2024 16:03:37 +0100
User-agent: mu4e 1.10.8; emacs 29.1

Liliana Marie Prikler <liliana.prikler@ist.tugraz.at> writes:

> For comparison:
>   time cat /tmp/meow/{0..7769}
>   […]
>   
>   real        0m0,144s
>   user        0m0,049s
>   sys 0m0,094s
>
> It takes GWL 6 times longer to compute the workflow than to create the
> inputs in Guile, and 600 times longer than to actually execute the
> shell command.  I think there is room for improvement :)

GWL checks if all input files exist before running the command.  Part of
the difference you see here (takes about 2 seconds on my laptop) is GWL
running FILE-EXISTS? on 7769 files.  This happens in prepare-inputs; its
purpose:

  "Ensure that all files in the INPUTS-MAP alist exist and are linked to
  the expected locations.  Pick unspecified inputs from the environment.
  Return either the INPUTS-MAP alist with any additionally used input
  file names added, or raise a condition containing the list of missing
  files."

Another significant delay is introduced by the cache mechanism, which
computes a unique prefix based on the contents of all input files.  It's
not unexpected that this will take a little while, but it's not great
either.

The rest of the time is lost in inferior package lookups and in using
Guix to build a script that likely already exists.  The latter is
something that we could cache (given identical output of "guix describe"
we could skip the computation of the process scripts).

-- 
Ricardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]