bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Support in sort for human-readable numbers


From: Vitali Lovich
Subject: Re: Support in sort for human-readable numbers
Date: Sun, 4 Jan 2009 16:30:48 -0500

On Sun, Jan 4, 2009 at 5:35 AM, Jim Meyering <address@hidden> wrote:
>
> "Vitali Lovich" <address@hidden> wrote:
>
> Thanks for the patch and for writing up your assumptions.
>
> The above requirement is key... and perhaps too restrictive.
> I.e., it makes it sound like your sort could mishandle
> sizes printed by a mix of output from du -h and du --si runs,
> not to mention numbers generated manually or by other tools.
Well as I stated in the assumptions - as long as the numbering is
internally consistent, my tool will handle this case.  However,
there's no tool that can really handle a mix of output from du -h & du
--si in the same stream, since there would be no way to identify when
the suffix represents 1000 & when it represents 1024.  Any solution
for this would have to have support for rule-based ways of figuring
out (i.e. every other line or all lines after the 5th) - but this is
obviously far too complicated for a tool like sort.  That use case
would require the use of far more complicated shell scripts to get the
correct behavior (although I'm unconvinced that this represents even
an uncommon use case).

The only behavior I'm insure of right now is when there are spaces
separating the suffix & the number.  Currently, the implementation
takes the first letter after the number ignoring spaces - which I just
realized is the incorrect behavior.  I'm thinking the suffix has to
follow the number immediately (no spaces) and have a space after it.
There's all sorts of ways that additional configuration options could
be added (i.e. support for spacing between # & suffix), but I'm
thinking that those use cases are better handled by having the user
write that logic in awk or whatever (i.e. my_utility | awk '{print
$3$4}' | sort -h if there are spaces between the suffix & number).

>
> However, this assumption might be acceptable (other opinions welcome),
> on the condition that the code behind this option diagnoses any violation.
>
> One of the first tasks for getting such an option into upstream is
> to describe and reach agreement on what the input grammar should be.
> I.e., is the "Gi" suffix allowed?  What about "GB" and "GiB"?
> If "Gi" is allowed, is it treated differently from "G"?
The current grammar is the letter [K, M, G, T, P, E, Z, Y] following
the number with no space (I'm going to patch the fix for the spacing
issue as soon as I get some free time).  My reasoning is that at least
with support for this sorting option, all the other use cases become
trivial through the use of other utilities.  Want GiB?  Simply pass
the output through sed and replace GiB with G.  I would do this, but,
especially with C-strings, it's much trickier to do the string &
comparison, and I'm unconvinced (but easily persuaded ;D) about the
utility of supporting more complicated suffix representations.

>
> As to what else would be required, see the guidelines in HACKING.
> E.g., you'd need to add many tests of this new feature.
Ok - I'll read over that document - I missed it the first time around.

Thanks for the feedback.
Vitali




reply via email to

[Prev in Thread] Current Thread [Next in Thread]