[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Sort: optimal memory usage with multithreaded sort
From: |
Pádraig Brady |
Subject: |
Re: Sort: optimal memory usage with multithreaded sort |
Date: |
Tue, 15 Jan 2013 20:26:55 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 |
On 01/15/2013 07:07 PM, Assaf Gordon wrote:
Hello,
Sort's memory usage (specifically, sort_buffer_size() ) has been discussed few
times before, but I couldn't find mention of the following issue:
If given a regular input file, sort tries to guesstimate the optimal buffer
size based on the file size.
But this value is calculated for one thread (before sort got multi-threaded).
The default "--parallel" value is 8 (or less, if fewer cores are available) -
which requires more memory.
The result is, that for a somewhat powerful machine (e.g. 128GB RAM, 32 cores -
not uncommon for a computer cluster),
sorting a big file (e.g 10GB) will always allocate too little memory, and will always
resort to saving temporary files on "/tmp".
The disk activity will result in slower sorting times than what could be done
in an all-memory sort.
Based on this:
http://lists.gnu.org/archive/html/coreutils/2010-12/msg00084.html ,
perhaps it would be beneficial to consider the number of threads in the memory
allocation ?
It's a fair point, but note since then, the default mem allocation
for sort has doubled and then subsequently capped at 75% of physical memory
due to external factors, as discussed at:
http://lists.gnu.org/archive/html/coreutils/2012-06/msg00019.html
It's often easy to look at sort's performance in isolation,
and one must be careful to consider other system loads
and architectures too.
thanks,
Pádraig.