Re: GNU Parallel Bug Reports Truncated large records

bug-parallel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel Bug Reports Truncated large records

From:	Ole Tange
Subject:	Re: GNU Parallel Bug Reports Truncated large records
Date:	Tue, 24 Feb 2015 02:30:32 +0100

On Mon, Feb 23, 2015 at 2:28 PM, Johannes Dröge
<address@hidden> wrote:
> Hi Ole and GNU parallel devs,
>
> I'm processing large files (~50 GiB) with variable record sizes and have the 
> following issues:
>
> 1) The processing run-time of individual blocks is more than linear with the 
> input size. Therefore, it would be best if GNU parallel would allow pass 
> single records or a fixed number of records for each job, or at least would 
> not automatically increase the block size. Instead, the block size 
> auto-detection increases the block size on large individual blocks until only 
> very few processes are being run in parallel which then dominate the overall 
> run-time. This behavior strongly impacts the granularity of the parallel 
> execution.

I would recommend using -N, but it segfaults on you example data.

> 2) I'm seeing that large records (>2 GiB) are being truncated at 2 GiB and 
> thus passed incompletely via stdin.

I can reproduce this. The output is 2GB - 4 KB.

That is a serious problem. Fixed for your task in git version c445232:

  git clone git://git.savannah.gnu.org/parallel.git

The fix is not a general fix.


/Ole

[Prev in Thread]

Current Thread

[Next in Thread]

GNU Parallel Bug Reports Truncated large records, Johannes Dröge, 2015/02/23
- Re: GNU Parallel Bug Reports Truncated large records, Ole Tange <=
  - Re: GNU Parallel Bug Reports Truncated large records, Johannes Dröge, 2015/02/24

Prev by Date: Re: GNU Parallel Bug Reports Auto-detection of available CPU cores
Next by Date: Re: GNU Parallel Bug Reports Truncated large records
Previous by thread: GNU Parallel Bug Reports Truncated large records
Next by thread: Re: GNU Parallel Bug Reports Truncated large records
Index(es):
- Date
- Thread