[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Command-line program to convert 'human' sizes?
From: |
Assaf Gordon |
Subject: |
Re: Command-line program to convert 'human' sizes? |
Date: |
Fri, 07 Dec 2012 10:07:55 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:10.0.4) Gecko/20120510 Icedove/10.0.4 |
Thank you for your feedback.
I'm working on fixing those issues.
Some comments/questions:
Pádraig Brady wrote, On 12/06/2012 06:59 PM:
> I noticed This command will core dump:
> $ /bin/ls -l | src/numfmt --to-unit=1 --field=5
> <snip>
> so I'm thinking `numfmt` should support --header too.
>
I'll add --header.
> The following should essentially be a noop with this data,
> but notice how the original spacing wasn't taken
> into account, and thus the alignment is broken:
>
> $ /bin/ls -l | tail -n+2 | head -n3 | src/numfmt --to-unit=1 --field=5
> -rw-rw-r--. 1 padraig padraig 93787 Aug 23 2011 ABOUT-NLS
> -rw-rw-r--. 1 padraig padraig 49630 Dec 6 22:32 aclocal.m4
> -rw-rw-r--. 1 padraig padraig 3669 Dec 6 22:29 AUTHORS
I'm a bit wary of adding automatic/heuristic kind of padding - could lead to
some weird outputs,
and also (when combined with header) will not produce proper output (because
the header will be skipped, but the lines would re-padded?).
Wouldn't it be better to either force the user to specify '--padding', or
switch from 'white-space' to an explicit delimiter, and then let "expand"
handle the expanding correctly?
e.g.
===
$ cat white-space-data.txt | \
sed 's/ */\t/g' | \
numfmt --field=5 --delimiter=$'\t' --to=SI | \
expand > output
===
A bit more convoluted, but more reliable?
>
> With this the alignment is broken as before,
> but I also notice the differing width output of each number.
>
> $ /bin/ls -l | tail -n+2 | head -n3 | src/numfmt --to=SI --field=5
> -rw-rw-r--. 1 padraig padraig 94k Aug 23 2011 ABOUT-NLS
> -rw-rw-r--. 1 padraig padraig 50k Dec 6 22:32 aclocal.m4
> -rw-rw-r--. 1 padraig padraig 3.7k Dec 6 22:29 AUTHORS
>
Again this is the automatic padding issue -
For example "94K" vs "3.7K" - should we always pad SI/IEC output to 5
characters (e.g. " 94K") even if the user didn't specify padding?
This would conflict with non-whitespace delimiters... e.g.:
Hello:94000:world
Would be converted to:
Hello:<space>94K:world
Which is not intuitive at all
Or perhaps the whole 'auto' padding should be enabled IFF delimiter is not
specified (and defaults to white-space) ?
>
> Notice in the above I've used capital K for SI.
> I think human() from gnulib may be using k for 1000 and K for 1024.
> That's non standard and ambiguous and I see no need to do that.
> So for IEC we'd have:
>
> $ /bin/ls -l | tail -n+2 | head -n3 | src/numfmt --to=IEC --field=5
> -rw-rw-r--. 1 padraig padraig 3.6Ki Dec 6 22:29 AUTHORS
>
I tried to use 'human_readable()' as-is, but I guess this is not sufficient.
I'll duplicate the code, and modify it to avoid this issue (lower/upper case K,
and the "i" suffix)
> Another thing I thought of there, was it would be
> good to be able to parse number formats that it can generate:
Sounds like two separate (but related) issues:
> $ echo '1,234' | src/numfmt --from=auto
> src/numfmt: invalid suffix in input '1,234': ',234'
1. Is there already a gnulib function that can accept locale-grouped values?
can the "xstrtoXXX" functions handle that?
> $ echo '3.7K' | src/numfmt --from=auto
> src/numfmt: invalid suffix in input '3.7K': '.7K'
2. Would you recommend switching internal representation to doubles (from the
current uintmax_t),
or just add special code to detect decimal point (which, as Bernhard
mentioned, is also locale dependent).
> While I said before it would be better to error rather than warn
> on parse error, on consideration it's probably best to write a
> warning to stderr on parse error, and leave the original number in place.
I'll change the code accordingly.
Regarding Bernhard's comments (from a different email):
Bernhard Voelker wrote, On 12/07/2012 03:25 AM:
> On 12/07/2012 12:59 AM, Pádraig Brady wrote:
>
> Therefore this is my first test:
> $ echo 11505426432 | src/numfmt
> 11505426432
> Hmm, shouldn't it converting that to a human-readable
> number then? ;-)
From Pádraig's original specification (
http://lists.gnu.org/archive/html/coreutils/2012-02/msg00085.html ) I assumed
that the default of both "--from" and "--to" is not to scale - So one needs to
explicitly use "--to" or "--from".
But those defaults can be changed, if you prefer.
> Looking at scale_from_args: I'd favor lower-case arguments,
> i.e. "si" and "iec" instead of "SI" and "IEC".
> WDYT?
I'll change those.
Regarding the help text and documentation:
I copied many of the texts from previous emails (the "Reformat numbers like
11505426432 to the more human-readable 11G" comes verbatim from one of Jim
Meyering's emails) - all of them would require better phrasing later.
Thanks,
-gordon
- Re: Command-line program to convert 'human' sizes?, (continued)
- Re: Command-line program to convert 'human' sizes?, Assaf Gordon, 2012/12/05
- Re: Command-line program to convert 'human' sizes?, Pádraig Brady, 2012/12/06
- Re: Command-line program to convert 'human' sizes?, Bernhard Voelker, 2012/12/06
- Re: Command-line program to convert 'human' sizes?, Pádraig Brady, 2012/12/06
- Re: Command-line program to convert 'human' sizes?, Bernhard Voelker, 2012/12/07
- Re: Command-line program to convert 'human' sizes?, Pádraig Brady, 2012/12/07
- Re: Command-line program to convert 'human' sizes?, Bernhard Voelker, 2012/12/07
- Re: Command-line program to convert 'human' sizes?,
Assaf Gordon <=
- Re: Command-line program to convert 'human' sizes?, Pádraig Brady, 2012/12/07