[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Sort: optimal memory usage with multithreaded sort
From: |
Assaf Gordon |
Subject: |
Sort: optimal memory usage with multithreaded sort |
Date: |
Tue, 15 Jan 2013 14:07:58 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120510 Icedove/10.0.4 |
Hello,
Sort's memory usage (specifically, sort_buffer_size() ) has been discussed few
times before, but I couldn't find mention of the following issue:
If given a regular input file, sort tries to guesstimate the optimal buffer
size based on the file size.
But this value is calculated for one thread (before sort got multi-threaded).
The default "--parallel" value is 8 (or less, if fewer cores are available) -
which requires more memory.
The result is, that for a somewhat powerful machine (e.g. 128GB RAM, 32 cores -
not uncommon for a computer cluster),
sorting a big file (e.g 10GB) will always allocate too little memory, and will
always resort to saving temporary files on "/tmp".
The disk activity will result in slower sorting times than what could be done
in an all-memory sort.
Based on this:
http://lists.gnu.org/archive/html/coreutils/2010-12/msg00084.html ,
perhaps it would be beneficial to consider the number of threads in the memory
allocation ?
Regards,
-gordon
- Sort: optimal memory usage with multithreaded sort,
Assaf Gordon <=