bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] sort: parallel external sort implementation


From: Joey Degges
Subject: [PATCH] sort: parallel external sort implementation
Date: Wed, 3 Mar 2010 22:01:27 -0800

Hello,

Matt (cc'd, not on list) and I have developed a patch that parallelizes sort
using pthreads. The threading approach taken here is to break up input files
into groups based on physical device and sort each of those groups in
parallel.
This allows for storage and processing resources to be more fully utilized
during the sort process.

Below is a set of experimental results and the statistics of the machine
that
they were run on. The test below show speed improvements from 45% - 64%.

In this patch the method used to determine which device a particular file is
on
relies on comparisons between struct stat -> st_dev fields. This leads to
some
problems in cases where files exist on different partitions of the same
disk.
Preliminary tests have shown that under these situations some performance
improvements are still seen, but they are not as dramatic as in the ideal
case.

Please send any comments/suggestions.

Thanks,
Joey

system details
CPU: Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
cache size: 4096 KB
RAM: 2G

disk details:
/q/0:
 Timing cached reads:   7556 MB in  2.00 seconds = 3781.13 MB/sec
 Timing buffered disk reads:  320 MB in  3.00 seconds = 106.50 MB/sec
/q/1:
 Timing cached reads:   7772 MB in  2.00 seconds = 3888.09 MB/sec
 Timing buffered disk reads:  224 MB in  3.03 seconds =  74.00 MB/sec
/q/2:
 Timing cached reads:   7444 MB in  2.00 seconds = 3724.34 MB/sec
 Timing buffered disk reads:  222 MB in  3.01 seconds =  73.77 MB/sec
/q/3:
 Timing cached reads:   7512 MB in  2.00 seconds = 3758.49 MB/sec
 Timing buffered disk reads:  240 MB in  3.02 seconds =  79.50 MB/sec

before each run the cache was dropped with:
  echo 3 > /proc/sys/vm/drop_caches

store temp files in RAM:
  mount -t tmpfs -o size=1000M /ram /ram
  export TMPDIR=/ram

multidisk sort with 4x 50M inputs
$ time -v sort_multidisk /q/0/50M /q/1/50M /q/2/50M /q/3/50M > /dev/null
        User time (seconds): 10.89
        System time (seconds): 0.83
        Percent of CPU this job got: 180%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.50
        Major (requiring I/O) page faults: 20
        Minor (reclaiming a frame) page faults: 72843
        Voluntary context switches: 1936
        Involuntary context switches: 149
        File system inputs: 393080

current sort with 4x 50M inputs
$ time -v sort_current /q/0/50M /q/1/50M /q/2/50M /q/3/50M > /dev/null
        User time (seconds): 11.05
        System time (seconds): 0.32
        Percent of CPU this job got: 86%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.19
        Major (requiring I/O) page faults: 18
        Minor (reclaiming a frame) page faults: 72532
        Voluntary context switches: 915
        Involuntary context switches: 48
        File system inputs: 198936


multidisk sort with 4x 100M input files
$ time -v sort_multidisk /q/0/100M /q/1/100M /q/2/100M /q/3/100M > /dev/null
        User time (seconds): 24.84
        System time (seconds): 1.35
        Percent of CPU this job got: 242%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.82
        Major (requiring I/O) page faults: 22
        Minor (reclaiming a frame) page faults: 145101
        Voluntary context switches: 3167
        Involuntary context switches: 523
        File system inputs: 784440

current sort with 4x 100M input files
$ time -v sort_current /q/0/100M /q/1/100M /q/2/100M /q/3/100M > /dev/null
        User time (seconds): 24.23
        System time (seconds): 0.65
        Percent of CPU this job got: 83%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:29.77
        Major (requiring I/O) page faults: 19
        Minor (reclaiming a frame) page faults: 144797
        Voluntary context switches: 3133
        Involuntary context switches: 41
        File system inputs: 784000


multidisk sort with 4x 200M input files
$ time -v sort_multidisk -S200M /q/0/200M /q/1/200M /q/2/200M /q/3/200M >
/dev/null
        User time (seconds): 52.15
        System time (seconds): 2.96
        Percent of CPU this job got: 162%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:33.99
        Major (requiring I/O) page faults: 35
        Minor (reclaiming a frame) page faults: 256351
        Voluntary context switches: 7215
        Involuntary context switches: 1048
        File system inputs: 1570224

current sort with 4x 200M input files:
$ time -v sort_current -S200M /q/0/200M /q/1/200M /q/2/200M /q/3/200M >
/dev/null
        User time (seconds): 49.67
        System time (seconds): 2.05
        Percent of CPU this job got: 82%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:02.66
        Major (requiring I/O) page faults: 20
        Minor (reclaiming a frame) page faults: 102675
        Voluntary context switches: 6637
        Involuntary context switches: 256
        File system inputs: 1566224

Attachment: 0001-sort-parallel-external-sort-implementation.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]