[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: match() problem
From: |
Stepan Kasal |
Subject: |
Re: match() problem |
Date: |
Wed, 9 Apr 2003 09:34:38 +0200 |
User-agent: |
Mutt/1.2.5.1i |
Hello,
On Fri, Apr 04, 2003 at 11:57:53AM +0900, KIMURA Koichi wrote:
> I use gawk 3.1.2.
> It seems problem at match() function when use multi-byte character.
>
> Sample program is here:
>
> gawk 'BEGIN {
> str = "XXYYZZ" # The fact is, X, Y, Z are multi-byte char.
> match(str, /X+Y+Z/)
> print RSTART,RLENGTH
>
> str = "aabbccddee"
> match(str, /a+b+c/)
> print RSTART,RLENGTH
> }'
>
> Result is:
> 0 -1
> 1 5
>
> When used multi-byte character, match() failed.
> Of course, multi-byte character support was enabled.
I beleive that this is correct. Regular expressions should match
_characters_, not bytes. This is true even if the ``character''
is multibyte.
If you want regexps to match bytes instead of characters, you should
probably use LANG=C.
Previous versions of gawk (<=3.1.1) used imperfect regex library
which didn't understand multibyte characters.
HTH,
Stepan Kasal
- match() problem, KIMURA Koichi, 2003/04/03
- Re: match() problem,
Stepan Kasal <=