bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep : problem with locale


From: John Cowan
Subject: Re: grep : problem with locale
Date: Thu, 20 Apr 2006 20:12:55 -0400
User-agent: Mutt/1.3.28i

Sylvain scripsit:

> There is a little problem with the man page :
> Finally, certain named classes of characters are predefined within bracket 
> expressions, as follows. Their names are self explanatory, and they are 
> [:alnum:], [:alpha:], [:cntrl:], [:digit:], [:graph:], [:lower:], 
> [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:]. For example, 
> [[:alnum:]] means [0-9A-Za-z], except the latter form depends upon the C 
> locale and the ASCII character encoding, whereas the former is independent 
> of locale and character set. (Note that the brackets in these class names 
> are part of the symbolic names, and must be included in addition to the 
> brackets delimiting the bracket list.) Most metacharacters lose their 
> special meaning inside lists. To include a literal ] place it first in the 
> list. Similarly, to include a literal ^ place it anywhere but first. 
> Finally, to include a literal - place it last.

I agree that the wording is confusing.  What's meant is that the form
[a-zA-Z] will match a letter only if by "letter" you mean "English
letter" and that your encoding is ASCII-compatible (it matches too
much on an EBCDIC system).  [:alpha:] on the other hand will match
whatever counts as a letter on the local system and will be independent
of character encoding.  In that sense, then, using [:alpha:] is locale-
and encoding-independent assuming that what you want is to match a letter,
whereas [A-Za-z] is neither.

That said, IMHO this whole business of locale-dependent letters is folly.
The letter e-acute is just as much a letter in the English word "resumé"
as in the French word "résumé"; for that matter, thorn is a letter in
both English and French contexts, though neither English nor French uses
it -- what else could it possibly be?

-- 
After fixing the Y2K bug in an application:     John Cowan
        WELCOME TO <censored>                   address@hidden
        DATE: MONDAK, JANUARK 1, 1900           http://www.ccil.org/~cowan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]