[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Gawk match() strange behaviour
From: |
Aharon Robbins |
Subject: |
Re: Gawk match() strange behaviour |
Date: |
Sat, 08 Sep 2007 23:18:16 +0300 |
Yes, locales definitely complicate issues. I am glad that things work
with gawk-stable from CVS; that means I have done my job correctly.
Arnold
> Date: Thu, 06 Sep 2007 23:07:01 +0200
> From: Alain Ketterlin <address@hidden>
> Subject: Re: Gawk match() strange behaviour
> To: Aharon Robbins <address@hidden>
> Cc: address@hidden
>
> Hi, thanks for your help.
>
> >> The following program:
> >>
> >> {
> >> r = match($0,/^ */,t);
> >> print "R=" r " S=" RSTART " L=" RLENGTH;
> >> }
> >>
> >> produces this (< signals input, > signals output)
> >> <
> >> > R=-1208966831 S=-1208966831 L=1208966850
> >> < random
> >> > R=1 S=1 L=34
> >> < random
> >> > R=1 S=1 L=2
>
> > I could not reproduce this using either stock gawk 3.1.5 or the current CVS
> > sources. I suggest that you try building from scratch from the CVS archive
> > on savannah.gnu.org.
> >
> > For the empty line I get
> >
> > R=1 S=1 L=0
>
> Things are getting strange (for me, I mean :). I just noticed that
> the locale has an impact.
>
> With gawk-3.1.5 (compiled from the tarball), under en_US.utf-8 I get:
> -from an empty line: R=1 S=1 L=18
> -from a line containing "random" (no space at beginning): R=1 S=1 L=34
> -from " random" (two spaces at beginning): R=1 S=1 l=2 (correct)
> Under en_US.iso-8859-1, everything is ok. So it seems that utf-8
> input is the problem.
>
> With gawk-stable checked out from savannah, everything is correct,
> under both locales.
>
> Thanks for your help.
>
> -- Alain.