bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel Bug Reports option --compress and --line-buffer causing


From: Ole Tange
Subject: Re: GNU Parallel Bug Reports option --compress and --line-buffer causing lots of disk usage and hangs and stderr output
Date: Thu, 6 Feb 2014 13:03:03 +0100

On Wed, Feb 5, 2014 at 7:55 PM, Derek Wilson <address@hidden> wrote:

> I'm using 20140122 because I noticed the addition of the --line-buffer
> option.

--line-buffer is somewhat of a hack. It does not work with compress.

The reason for this is that the running program sends data directly to
lzop which saves it to a file. Only when the program finishes, I
rewind the file and passes that as STDIN to lzop -d. This will of
course fail if the file is not complete.

What needs to be done for --line-buffer --compress to work is something ala:

Create tmpfile: true > tmpfile
Start 'tail -f tmpfile | lzop -d' and get a file handle for stdout
Start program | lzop >> tmpfile
Remove tmpfile (to avoid manual cleanup if GNU Parallel crashes)
Every now and then do a non-blocking read on all 'lzop -d' file
handles and print if there is a full line.
When program stops, somehow tell tail -f to send all remaining data
(even incomplete lines) to lzop -d and exit without sending a SIGPIPE
(not sure how to do that).
Read until EOF from the 'lzop -d' filehandle.

It is probably doable, but definitely a lot of work. And unless
someone convinces me that this a killer feature, then I will only
implement that on a consultancy basis (150 USD/hr).

> Thanks for that by the way.

For the manual I need good examples for what people use this option
for. I have yet to find the killer usage, so it would be great if you
could describe what you use it for.

> Then i saw --compress and I know this
> will be very useful, but I'm running into issues using it.
>
> I tend to use parallel to process streaming data and I'm doing something
> that looks like this:
>
> yes whatisthisnow | head -n100000000 | parallel --pipe -N100000 -L1 -j512
> cat 2> cattest.log > cattest.out
>
> if i add --compress:
>
> * my log file has a bunch of lines like: "lzop: <stdin>: not a lzop file"
> * i do get streaming output (but is it per job like the default?)
> * not all of the lzop processes exit and parallel hangs indefinitely

It seems you have found a bug. I can even reproduce it with:

  echo k| parallel --pipe  --compress  cat

So --pipe and --compress do currently not work together. I will assume
that is a minor fix (but it could very will be a major debugging
task).

> if i add --line-buffer to that I don't get any output at all ever.

This is to be expected, and GNU Parallel should probably fail with a
decent error message if you do --line-buffer --compress.

> i do hope there is an easy way to resolve these issues and enable me to use
> lzop for temp files while still getting lines as soon as they are available.

That, however, will not work (but see above).

/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]