bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: merge sort temporary files


From: Jonathan Baker
Subject: Re: merge sort temporary files
Date: Fri, 14 May 2004 00:39:10 -0700
User-agent: Mutt/1.4.1i

Great!  Look forward to seeing this in the distribution.  Thanks,

  --Jonathan


On Fri, May 14, 2004 at 12:01:15AM -0700, Paul Eggert wrote:
> Instead of adding a new option, I think I'd rather change 'sort' to
> cater to your (relatively common) case, rather than to the (relatively
> contrived) cases like `cat F | sort -m -o F - G' where people should
> know that they're getting into trouble anyway.
> 
> Here's a proposed patch to solve your problem that way instead.
> 
> 2004-05-13  Paul Eggert  <address@hidden>
> 
>       Improve performance of `sort -m' on large files, at the cost of
>       making some contrived examples unsafe.  POSIX allows this
>       optimization.  Performance problem reported by Jonathan Baker in
>       <http://mail.gnu.org/archive/html/bug-coreutils/2004-05/msg00071.html>.
> 
>       * src/sort.c (first_same_file): Do not treat input pipes
>       differently from other files.
>       * doc/coreutils.texi (sort invocation): Document that "sort -m -o F"
>       might write F before reading all the input.
>       * NEWS: Likewise.
> 
> Index: NEWS
> ===================================================================
> RCS file: /home/meyering/coreutils/cu/NEWS,v
> retrieving revision 1.206
> diff -p -u -r1.206 NEWS
> --- NEWS      11 May 2004 16:48:42 -0000      1.206
> +++ NEWS      14 May 2004 06:35:30 -0000
> @@ -20,6 +20,12 @@ GNU coreutils NEWS                      
>  
>  ** New features
>  
> +  For efficiency, `sort -m' no longer copies input to a temporary file
> +  merely because the input happens to come from a pipe.  As a result,
> +  some relatively-contrived examples like `cat F | sort -m -o F - G'
> +  are no longer safe, as `sort' might start writing F before `cat' is
> +  done reading it.  This problem cannot occur unless `-m' is used.
> +
>    pwd now works even when run from a working directory whose name
>    is longer than PATH_MAX.
>  
> Index: doc/coreutils.texi
> ===================================================================
> RCS file: /home/meyering/coreutils/cu/doc/coreutils.texi,v
> retrieving revision 1.180
> diff -p -u -r1.180 coreutils.texi
> --- doc/coreutils.texi        9 May 2004 19:42:19 -0000       1.180
> +++ doc/coreutils.texi        14 May 2004 06:32:53 -0000
> @@ -3265,9 +3265,13 @@ starting with 1.  So to sort on the seco
>  @opindex --output
>  @cindex overwriting of input, allowed
>  Write output to @var{output-file} instead of standard output.
> -If necessary, @command{sort} reads input before opening
> +Normally, @command{sort} reads all input before opening
>  @var{output-file}, so you can safely sort a file in place by using
>  commands like @code{sort -o F F} and @code{cat F | sort -o F}.
> +However, @command{sort} with @option{--merge} (@option{-m}) can open
> +the output file before reading all input, so a command like @code{cat
> +F | sort -m -o F - G} is not safe as @command{sort} might start
> +writing @file{F} before @command{cat} is done reading it.
>  
>  @vindex POSIXLY_CORRECT
>  On newer systems, @option{-o} cannot appear after an input file if
> Index: src/sort.c
> ===================================================================
> RCS file: /home/meyering/coreutils/cu/src/sort.c,v
> retrieving revision 1.284
> diff -p -u -r1.284 sort.c
> --- src/sort.c        26 Apr 2004 15:37:33 -0000      1.284
> +++ src/sort.c        14 May 2004 05:45:52 -0000
> @@ -1878,9 +1878,7 @@ sortlines_temp (struct line *lines, size
>  }
>  
>  /* Return the index of the first of NFILES FILES that is the same file
> -   as OUTFILE.  If none can be the same, return NFILES.  Consider an
> -   input pipe to be the same as OUTFILE, since the pipe might be the
> -   output of a command like "cat OUTFILE".  */
> +   as OUTFILE.  If none can be the same, return NFILES.  */
>  
>  static int
>  first_same_file (char * const *files, int nfiles, char const *outfile)
> @@ -1910,7 +1908,7 @@ first_same_file (char * const *files, in
>           ? fstat (STDIN_FILENO, &instat)
>           : stat (files[i], &instat))
>          == 0)
> -       && (S_ISFIFO (instat.st_mode) || SAME_INODE (instat, outstat)))
> +       && SAME_INODE (instat, outstat))
>       return i;
>      }
>  




reply via email to

[Prev in Thread] Current Thread [Next in Thread]