bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: match() problem


From: Stepan Kasal
Subject: Re: match() problem
Date: Wed, 9 Apr 2003 09:34:38 +0200
User-agent: Mutt/1.2.5.1i

Hello,

On Fri, Apr 04, 2003 at 11:57:53AM +0900, KIMURA Koichi wrote:
> I use gawk 3.1.2.
> It seems problem at match() function when use multi-byte character.
> 
> Sample program is here:
> 
> gawk 'BEGIN {
>   str = "XXYYZZ" # The fact is, X, Y, Z are multi-byte char.
>   match(str, /X+Y+Z/)
>   print RSTART,RLENGTH
> 
>   str = "aabbccddee"
>   match(str, /a+b+c/)
>   print RSTART,RLENGTH
> }'
> 
> Result is:
> 0 -1
> 1 5
> 
> When used multi-byte character, match() failed.
> Of course, multi-byte character support was enabled.

I beleive that this is correct.  Regular expressions should match
_characters_, not bytes.  This is true even if the ``character''
is multibyte.

If you want regexps to match bytes instead of characters, you should
probably use LANG=C.

Previous versions of gawk (<=3.1.1) used imperfect regex library
which didn't understand multibyte characters.

HTH,
        Stepan Kasal




reply via email to

[Prev in Thread] Current Thread [Next in Thread]