[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Sort with header/skip-lines support
From: |
Pádraig Brady |
Subject: |
Re: Sort with header/skip-lines support |
Date: |
Fri, 11 Jan 2013 18:13:00 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 |
On 01/11/2013 04:10 PM, Assaf Gordon wrote:
Pádraig Brady wrote, On 01/10/2013 07:11 PM:
On 01/10/2013 09:57 PM, Assaf Gordon wrote:
I'd like to re-visit an old issue: adding header-line/skip-lines support to
'sort'.
[...]
[2] - no pipe support:
http://lists.gnu.org/archive/html/bug-coreutils/2007-07/msg00215.html
But recent sed can be used for this like: `seq -u 1q`
http://git.sv.gnu.org/gitweb/?p=sed.git;a=commit;h=737ca5e
Note that commit is 4 years old, but only recently released sed 4.2.2 contains
it.
Thanks for the tip.
Note one can also add -n to the sed command,
to get it to strip the header entirely.
The following indeed works with sed 4.2.2 ( on linux 3.2 ):
$ ( echo 99 ; seq 10 ) | ( sed -u 1q ; sort -n )
But I'm wondering (as per the link above [2]) if this is posix compliant and
stable (i.e. can this be trusted to work everytime, even on non-linux
machines?).
No `sed -u` with this functionality is not portable.
Though it's more portable than `sort --header`
given that it already exists :)
[3] - Jim's patch:
http://lists.gnu.org/archive/html/coreutils/2010-11/msg00091.html
Thanks for collating the previous threads on this subject.
I'm on the fence on how warranted this is TBH.
We'd need stronger arguments for it I think.
I'll collate the arguments as well :)
If the "sed" method works reliably, it leaves error checking: how to reliably
check for error in such a pipe (inside a posix shell script)?
The closest code I found is this: https://github.com/cheusov/pipestatus which
seems very long.
For completeness, showing the current options for such cases...
So additional arguments are:
1. robust error checking
2. simplicity of use: if 'sort' had this option built-in, the following use cases would
"just work". with sed+sort, it will require different invocations (and probably
different pitfalls):
a. one input file
(sed -u 1q && sort) < file
b. one input pipe
seq 10 | ( sed -u 1q && sort -n )
c. multiple input files (without resorting to pipe, as this will cause
'sort' to use different amount of memory)
So for multiple files, we'd only take the header from the first, I suppose:
(head -q -n1 file.* | head -n1; tail -q -n+2 file.* | sort)
There is also the --merge case.
This is especially awkward with the per file constructs:
(head -q -n1 file.*; sort -m <(tail -n+2 file.1) <(tail -n+2 file.1))
d. specifying output file (with "-o")
How does -o impact things?
Thanks,
-gordon
As a side note, I have a hackish Perl script that wraps sort and consumes the
first line, and it's basically works-for-me kind of script - but I just wish it
wasn't necessary:
https://github.com/agordon/bin_scripts/blob/master/scripts/sort-header.in
thanks for collating the arguments for --header.
Pádraig