Re: [PATCH] sort: parallel external sort implementation

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] sort: parallel external sort implementation

From:	Chen Guo
Subject:	Re: [PATCH] sort: parallel external sort implementation
Date:	Thu, 4 Mar 2010 04:12:57 -0800 (PST)

Hi Padraig,
    The documentations coming in my patch, since we're both using the same
--threads option. I haven't put any examples in yet, but I suppose I should.
Thanks for the heads up.

    Why would there be a need to switch to processes? There's more memory
and communication overhead. The only advantage I'd see is address space,
which we'd likely never need.



----- Original Message ----
> From: Pádraig Brady <address@hidden>
> To: Joey Degges <address@hidden>
> Cc: Report bugs to <address@hidden>; Matt Ball <address@hidden>
> Sent: Thu, March 4, 2010 4:06:19 AM
> Subject: Re: [PATCH] sort: parallel external sort implementation
> 
> On 04/03/10 06:01, Joey Degges wrote:
> > Hello,
> >
> > Matt (cc'd, not on list) and I have developed a patch that parallelizes sort
> > using pthreads. The threading approach taken here is to break up input files
> > into groups based on physical device and sort each of those groups in
> > parallel.
> > This allows for storage and processing resources to be more fully utilized
> > during the sort process.
> >
> 
> > multidisk sort with 4x 50M inputs
> > $ time -v sort_multidisk /q/0/50M /q/1/50M /q/2/50M /q/3/50M>  /dev/null
> >          User time (seconds): 10.89
> >          System time (seconds): 0.83
> >          Percent of CPU this job got: 180%
> >          Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.50
> >          Major (requiring I/O) page faults: 20
> >          Minor (reclaiming a frame) page faults: 72843
> >          Voluntary context switches: 1936
> >          Involuntary context switches: 149
> >          File system inputs: 393080
> >
> > current sort with 4x 50M inputs
> > $ time -v sort_current /q/0/50M /q/1/50M /q/2/50M /q/3/50M>  /dev/null
> >          User time (seconds): 11.05
> >          System time (seconds): 0.32
> >          Percent of CPU this job got: 86%
> >          Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.19
> >          Major (requiring I/O) page faults: 18
> >          Minor (reclaiming a frame) page faults: 72532
> >          Voluntary context switches: 915
> >          Involuntary context switches: 48
> >          File system inputs: 198936
> 
> > diff --git a/src/sort.c b/src/sort.c
> > +      --threads=N           use no more than N threads to improve 
> parallelism\n\
> 
> Threads is a bit of an implementation detail?
> Perhaps it would be better to use --parallel so one
> could move to multiprocess in future if desired?
> 
> Since this is a new option, it requires documentation
> in docs/coreutils.texi which would include examples
> of how and why you would use this option.
> 
> > +
> > +/* Sort NFILES FILES onto OUTPUT_FILE.
> > +
> > +   Threading approach: Break FILES up into several groups where each 
> > contains
> > +   only files that can be found on the same physical device (according to
> > +   device_cmp()). Spawn threads to execute do_sort() on each group of 
> > files 
> in
> > +   parallel.
> > +
> > +   This allows for all concerned resources (storage devices and 
> > processors) 
> to
> > +   be more fully utilized.
> > +*/
> > +
> > +static void
> > +sort_multidisk (char * const *files, size_t nfiles, char const 
> > *output_file,
> > +                unsigned long int nthreads)
> 
> Have you considered the seek characteristics of SSDs
> and how they might affect things (with consideration
> that mechanical disks will be replaced by SSDs quite quickly).
> There still would be some benefit splitting per SSD,
> but it would be worth measuring to see.
> 
> > @@ -3691,7 +3996,21 @@ main (int argc, char **argv)
> >        IF_LINT (free (sortfiles));
> >      }
> >    else
> > -    sort (files, nfiles, outfile);
> > +    {
> > +      if (!nthreads)
> > +        {
> > +          /* The default NTHREADS is 2 ** floor (log2 (number of 
> processors)).
> > +             If comparisons do not vary in cost and threads are
> > +             scheduled evenly, with the current algorithm there is no
> > +             performance advantage to using a number of threads that
> > +             is not a power of 2.  */
> > +          unsigned long int np2 = num_processors (NPROC_CURRENT) / 2;
> > +          for (nthreads = 1; nthreads <= np2; nthreads *= 2)
> > +            continue;
> > +        }
> 
> You probably want NPROC_CURRENT_OVERRIDABLE ?
> 
> cheers,
> Pádraig.

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH] sort: parallel external sort implementation, Joey Degges, 2010/03/04
- Re: [PATCH] sort: parallel external sort implementation, Pádraig Brady, 2010/03/04
  - Re: [PATCH] sort: parallel external sort implementation, Chen Guo, 2010/03/04
    - Re: [PATCH] sort: parallel external sort implementation, Pádraig Brady, 2010/03/04
    - Re: [PATCH] sort: parallel external sort implementation, Joey Degges, 2010/03/04
- Re: [PATCH] sort: parallel external sort implementation, Pádraig Brady, 2010/03/04
  - Re: [PATCH] sort: parallel external sort implementation, Chen Guo <=
    - Re: [PATCH] sort: parallel external sort implementation, Pádraig Brady, 2010/03/04
  - Re: [PATCH] sort: parallel external sort implementation, Joey Degges, 2010/03/04
    - Re: [PATCH] sort: parallel external sort implementation, Pádraig Brady, 2010/03/04
  - Re: [PATCH] sort: parallel external sort implementation, Ralf Wildenhues, 2010/03/07
    - Re: [PATCH] sort: parallel external sort implementation, Chen Guo, 2010/03/07

Prev by Date: Re: [PATCH] sort: parallel external sort implementation
Next by Date: Re: [PATCH] sort: parallel external sort implementation
Previous by thread: Re: [PATCH] sort: parallel external sort implementation
Next by thread: Re: [PATCH] sort: parallel external sort implementation
Index(es):
- Date
- Thread