[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Sort with header/skip-lines support
From: |
Pádraig Brady |
Subject: |
Re: Sort with header/skip-lines support |
Date: |
Fri, 11 Jan 2013 00:11:14 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 |
On 01/10/2013 09:57 PM, Assaf Gordon wrote:
Hello,
I'd like to re-visit an old issue: adding header-line/skip-lines support to
'sort'.
It has been discussed few times in the past, but IMHO the suggested workarounds
fall short:
1. Sometimes using 'bash' specific constructs [1]
2. No error checking (e.g. running head/tail/sed without checking for errors)
3. Using multiple input files is convoluted.
4. Suggestions work for regular files, but not for pipes [2].
The attached draft patch is based on Jim Hester's patch [3], rebased to the
latest sort, with some fixes and tests.
It seems to work fine, except one glaring omission: it only works when output
is STDOUT because creating the output file is a brute-force ugly hack.
The syntax is
sort --skip-lines=N [other options]
That's a bit ambiguous and might suggest that the header line
was not output after the sort? Maybe keep consistent with
`join` and `numfmt` and use --header.
The two tests are:
make check TESTS=tests/misc/sort-skip-lines SUBDIRS=.
make check TESTS=tests/misc/sort-skip-lines-bigfiles SUBDIRS=.
RUN_EXPENSIVE_TESTS=yes
If this is something you are willing to consider, I'm happy to hear comments
and suggestions and improve it.
Alternatively, perhaps this is a good candidate for a "contrib" script, but I'm
not sure how do go about developing a shell script that is posix compliant, has robust
error checking, and still be a full 'drop-in' replacement for sort (many options
combinations).
Thanks,
-gordon
[1] - bash work-around:
http://lists.gnu.org/archive/html/coreutils/2010-11/msg00084.html
[2] - no pipe support:
http://lists.gnu.org/archive/html/bug-coreutils/2007-07/msg00215.html
Note the pipe issue might be handled with `stdbuf -i0 head ...`
but head doesn't use stdio so that won't work.
But recent sed can be used for this like: `seq -u 1q`
http://git.sv.gnu.org/gitweb/?p=sed.git;a=commit;h=737ca5e
Note that commit is 4 years old, but only recently released sed 4.2.2 contains
it.
[3] - Jim's patch:
http://lists.gnu.org/archive/html/coreutils/2010-11/msg00091.html
Thanks for collating the previous threads on this subject.
I'm on the fence on how warranted this is TBH.
We'd need stronger arguments for it I think.
thanks,
Pádraig.