[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU Parallel Bug Reports Truncated large records
From: |
Ole Tange |
Subject: |
Re: GNU Parallel Bug Reports Truncated large records |
Date: |
Tue, 24 Feb 2015 02:30:32 +0100 |
On Mon, Feb 23, 2015 at 2:28 PM, Johannes Dröge
<address@hidden> wrote:
> Hi Ole and GNU parallel devs,
>
> I'm processing large files (~50 GiB) with variable record sizes and have the
> following issues:
>
> 1) The processing run-time of individual blocks is more than linear with the
> input size. Therefore, it would be best if GNU parallel would allow pass
> single records or a fixed number of records for each job, or at least would
> not automatically increase the block size. Instead, the block size
> auto-detection increases the block size on large individual blocks until only
> very few processes are being run in parallel which then dominate the overall
> run-time. This behavior strongly impacts the granularity of the parallel
> execution.
I would recommend using -N, but it segfaults on you example data.
> 2) I'm seeing that large records (>2 GiB) are being truncated at 2 GiB and
> thus passed incompletely via stdin.
I can reproduce this. The output is 2GB - 4 KB.
That is a serious problem. Fixed for your task in git version c445232:
git clone git://git.savannah.gnu.org/parallel.git
The fix is not a general fix.
/Ole