bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Match returns impossible character range


From: Cefn Hoile
Subject: Match returns impossible character range
Date: Mon, 12 Jul 2010 12:49:15 +0100

Match seems to set RSTART and RLENGTH to impossible values, for
example my error reporting code...

match($0, nameregexp)
if(RSTART){
        print "Match found for: ", nameregexp, "  :at " RSTART, ",", RLENGTH
}

...offers this result....

Match found for:  <form[^>]*(name=[^[:space:]>]*)[^>]*>   :at 1 , 1

I believe this is an impossibility, given that the fixed parts of this
regular expression are more than 10 characters long!

BACKGROUND

Trying to use Awk to identify HTML forms and their controls
(input|select|textarea) elements by parsing open and close tags.

This is a multiline problem and may possibly exceed some implicit
memory limit in gawk, although the files I'm handling are fairly small
(max 188k).

EXAMPLE FILES

I'm pretty convinced that gawk should never produce the result
reported, regardless of the files which are being processed.

Although I have attached the AWK code which generates the error, the
html source files I'm processing have some mildly sensitive data in
them (names of people) and I cannot seem to recreate the bug with a
minimal source file which I can share.

If this bug is not recognised, (this may be a standard and known
issue), then I'll try to generate a sanitized file which recreates the
issue.

Cefn
http://cefn.com

Attachment: findforms.awk
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]