bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Match returns impossible character range


From: Cefn Hoile
Subject: Re: Match returns impossible character range
Date: Mon, 12 Jul 2010 14:18:33 +0100

Sorry for repeat posting. Forgot to include version information. I'm
running on Ubuntu Karmic with the version listed below.

/usr/bin/gawk --version
GNU Awk 3.1.6
Copyright (C) 1989, 1991-2007 Free Software Foundation.

On 12 July 2010 12:49, Cefn Hoile <address@hidden> wrote:
> Match seems to set RSTART and RLENGTH to impossible values, for
> example my error reporting code...
>
> match($0, nameregexp)
> if(RSTART){
>        print "Match found for: ", nameregexp, "  :at " RSTART, ",", RLENGTH
> }
>
> ...offers this result....
>
> Match found for:  <form[^>]*(name=[^[:space:]>]*)[^>]*>   :at 1 , 1
>
> I believe this is an impossibility, given that the fixed parts of this
> regular expression are more than 10 characters long!
>
> BACKGROUND
>
> Trying to use Awk to identify HTML forms and their controls
> (input|select|textarea) elements by parsing open and close tags.
>
> This is a multiline problem and may possibly exceed some implicit
> memory limit in gawk, although the files I'm handling are fairly small
> (max 188k).
>
> EXAMPLE FILES
>
> I'm pretty convinced that gawk should never produce the result
> reported, regardless of the files which are being processed.
>
> Although I have attached the AWK code which generates the error, the
> html source files I'm processing have some mildly sensitive data in
> them (names of people) and I cannot seem to recreate the bug with a
> minimal source file which I can share.
>
> If this bug is not recognised, (this may be a standard and known
> issue), then I'll try to generate a sanitized file which recreates the
> issue.
>
> Cefn
> http://cefn.com



reply via email to

[Prev in Thread] Current Thread [Next in Thread]