help-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort - specifying sort fields/keys.


From: cga2000
Subject: Re: sort - specifying sort fields/keys.
Date: Wed, 09 Apr 2008 13:39:09 -0400
User-agent: Mutt/1.5.13 (2006-08-11)

On Tue, Apr 08, 2008 at 07:42:21PM EDT, Bob Proulx wrote:
> cga2000 wrote:
> > Bob Proulx wrote:

[..]

> The man pages are great for quick reference of major features.  But
> the primary documentation for most GNU software is in the info pages.

Sometimes I wish the humongous gnu/screen man page were available in a
more clearly structured format like texinfo.. :-)

But as to the man pages being seen as a quick reference .. I would tend
to disagree .. that's more the role of the --help option.

But in any case, for hobbyists like myself who pretty much live on
borrowed time .. it can be really frustrating to have to spend more time
and energy hunting for the adhoc doc than actually reading it (surely
you've heard of the debian/gnu wars re: doc licensing, and its impact on
the availability of recent texinfo docs in .deb format  right?)

>   info coreutils 'ls invocation'
> 
>   `--color [=WHEN]'
>        Specify whether to use color for distinguishing file types.  WHEN
>        may be omitted, or one of:
>           * none - Do not use color at all.  This is the default.
>           * auto - Only use color if standard output is a terminal.
>           * always - Always use color.
>        Specifying `--color' and no WHEN is equivalent to `--color=always'.
>        Piping a colorized listing through a pager like `more' or `less'
>        usually produces unreadable results.  However, using `more -f'
>        does seem to work.

So I guessed right .. --color=auto, I mean .. Seem to be getting the
hang of it!

> > I did a 
> > 
> > $ ls -al | sort -k1.1,1.1r -k8f
> > 
> > and the output is identical (properly sorted with dots ignored)
> 
> Based upon locale setting, right?

Yes. Still getting dot-files intermixed with non-dot files.
> 
> > >   
> > > http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
> > 
> > Thanks.  Good doc.
> 
> The descriptions usually talk about LC_ALL because it gets
> complicated.  Really it is intended to set LANG.  But LANG is
> overridden by LC_COLLATE and so setting LANG may have no effect.  But
> LC_COLLATE is again overridden by LC_ALL.  Saying all of that in the
> quick docs gets complicated and still doesn't really describe things
> like how it interacts with LC_CTYPE.  I have no idea what (possibly
> bad) effects there will be for setting an incompatible combination of
> LANG, LC_CTYPE and LC_COLLATE will have on some languages.  So it
> simpler just to describe LC_ALL=C as the biggest possible lever.  But
> normally one would only set the lower priority locale vars such as
> LANG and possibly LC_COLLATE such as I have done.

My understanding of the problem is just too limited at this point for me
to seriously consider looking into the solutions/implementation.

:-(

> > > Apparently the people who defined the collating sequence for the en_*
> > > locales confused working with data on a computer with working with
> > > text on a computer.  The locale collating sequences for en_* ignores
> > > punctuation and folds case by default!
> > 
> > Given the symptoms and the nature of sort I would probably never have
> > figured that out myself. That there may be circumstances where this
> > comes in handy I do not doubt .. But as to making it the default for one
> > of (if not the) most widely-used locales?
> 
> It certainly annoys me.  But they didn't consult me when the collating
> sequence was chosen.
> 
> > I gave  up on UTF-8 because I use mostly ELinks for browsing and afaik
> > it's not UTF-8 ready.
> 
> How does ELinks compare to Links, Lynx, or w3m for UTF-8 support?  I
> only use them for basic plain text us-ascii pages and so can't judge.

I'm no expert obviously but I believe that ELinks supports ISO8859-1 ..
and that's it. IOW, I have not had any issues with most pages that hail
from Western Europe.. That's about my level of expertise in this area.

I am under the impression that w3m (patched?) supports UTF-8 but I have
not investigated this.

As to the other two I have never used them.

As to ELinks, since I am pretty much allergic to the GUI model and have
chosen to limit my browsing to text it pretty much meets my needs. My
main concern is that it doesn't look like there's a lot going on
development-wise .. so if you're not already using it I wouldn't
recommend switching .. 

> > I tested with LC_ALL=POSIX (as recommended in your document) and the
> > "." was still being ignored.
> 
> Hmm...  Works for me.  Please double check everything.
> 
>   $ touch .baz .foo bar baz foa foo foz
> 
>   $ LC_ALL=en_US.UTF-8 ls -A1
>   bar
>   baz
>   .baz
>   foa
>   foo
>   .foo
>   foz
> 
>   $ LC_ALL=C ls -A1
>   .baz
>   .foo
>   bar
>   baz
>   foa
>   foo
>   foz

OK.  I'll test again with LC_ALL=POSIX.  
> 
> > So I issued the above export commands and (magically) data was sorted
> > as data .. 
> 
> Oh good.
> 
> > Thank you very much for your clarification. 
> 
> Glad to help,

Thanks, Bob.

BTW .. those awk one-liners that you helped me write a while ago really
do a great job!  

Thanks for that as well.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]