bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: coreutils patch to multithread md5sum for parallel hashing (ala the


From: Giuseppe Scrivano
Subject: Re: coreutils patch to multithread md5sum for parallel hashing (ala the HP-UX days)
Date: Thu, 25 Mar 2010 13:49:59 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)

A similar patch was rejected some months ago:

  http://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00143.html

As solution, Pádraig suggests to use find(1).

You can take advantage of the new utility nproc(1), distributed with
recent coreutils versions to get the number of processing units
available on your system.

Cheers,
Giuseppe



"Brett L. Trotter" <address@hidden> writes:

> Hello, this is my first post to the list, so I'll say in advance here
> I'm pleased to meet you all.
>
> I've been out of C/C++ land for a while due to the economy, but found
> myself hashing a bunch of 46GB blu ray images and discs for verification
> lately and wanted a simple way to cut down the time involved without
> starting separate terminals, running screen, etc. HP-UX's md5sum
> had/has(?) a -n option for parallelizing the hashing. I did a quick
> implementation today, and it's probably nothing like the sort of code
> you folks write and likely can be optimized quite a bit, but I was
> sincerely hoping that the feature could make it into coreutils, either
> based on my code or someone else's.
>
> It's a patch against the version in coreutils-5.97-23.el5_4.2.src.rpm on
> RHEL 5.4. It's been tested lightly, shows a performance -decrease- for
> small numbers of small files, but in increase for larger files or larger
> numbers of files. I haven't yet gotten around to making the ptach apply
> to the makefile.am, so I was manually adding -lpthread to the link lines
> for the *sum programs in the generated makefile.
>
> Again, this is not anywhere near a production ready patch- and I'm aware
> that output ordering will be potentially out of order when N > 1 is
> used, but I'd love any thoughts, improvements, or reasons why md5sum
> shouldn't be able to parallel process like the old days.
>
> -Brett




reply via email to

[Prev in Thread] Current Thread [Next in Thread]