m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch-1_4 patsubst bug, and missing arguments


From: Eric Blake
Subject: branch-1_4 patsubst bug, and missing arguments
Date: Thu, 24 Aug 2006 08:24:14 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.5) Gecko/20060719 Thunderbird/1.5.0.5 Mnenhy/0.7.4.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

First, I noticed that Solaris produces output for eval, index, incr, decr,
and len, even when given zero arguments.  And for index, substr, and
translit, it gives output even with one argument:

$ /usr/xpg/bin/m4
eval index incr decr len
0 0 1 -1 0
index(abc) substr(abc) translit(abc)
0 abc abc
$

The documentation on the branch already states that the first five macros
are blind, and that builtin/indir just warn and expand to the empty string
when invoking a blind macro without arguments, so I didn't think that
worth changing on the branch.  I would like to do that on head, though, so
that 'm4 -G -Q' can give the same results as Solaris.  But for the
one-argument case, we were issuing a warning about a missing argument (or
not even that, with -Q), but expanding to nothing.  So I fixed those
macros, along with regexp and patsubst for consistency, to still issue the
warning but issue output like Solaris.

Second, while testing, I noticed that patsubst has had a bug, and our
testsuite had a flaw, because it had a test that expected the buggy output
from patsubst.  Here's the bug:

$ m4
patsubst(abc, ^\|$, -)
- -abc-
patsubst(a, ^\|$, -)
- -a
$

Oops, we were not detecting an empty match at the end of a string if there
was previously an empty match at the preceding character.  Classic
off-by-one bug.

Head has similar issues in both cases, but I would like to clean up
argument handling there first - it has always bothered me that min_args
and max_args are off by 1 (ie. m4_bad_argc(..., 2, 2) means a macro takes
exactly one argument); the confusion shows in modules/m4.c where some
builtins had 0 and others 1 for min_arg when a macro is not blind.  In the
process of fixing that, I can consolidate the groks_macro_args and
blind_if_no_args bools into a bitfield, making it more extensible to add
other bools to each builtin.  I envision adding a bool of whether having
too few args results in an empty expansion or not, so that macros like
regexp can still produce output with too few args without having to also
call m4_bad_argc (also fixing my current hack to make syscmd/esyscmd still
change sysval when there are too few args).  This would be a
binary-incompatible change with any existing modules built against
m4-1.4o, but I believe it is okay to change the interface right up until
2.0 is released, as we have documented all along that beta releases are
subject to change.

I am declaring this to be my last non-maintenance fix prior to 1.4.6 - I
am starting the release process after this checkin.

2006-08-24  Eric Blake  <address@hidden>

        * src/builtin.c (m4_index, m4_substr, m4_translit): Similar to
        Solaris, produce output on just one argument.
        (m4_regexp, m4_patsubst): For consistency, do likewise.
        (m4_patsubst): Allow zero-length match at end of string.
        * doc/m4.texinfo (Sysval): Fix overfull hbox.
        (Bugs, Macro Arguments): Minor fixes.
        (Other tokens): Rearrange node order.
        (Index macro, Substr, Translit, Regexp, Patsubst): Add tests.
        * NEWS: Document these fixes.

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE7baN84KuGfSFAYARAugEAJ95NKxBNUibmG6jFPDv8uA8/OX8qQCePtNa
KtcuIRbKE3mt04rZi5nVknU=
=SAV1
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.55
diff -u -p -r1.1.1.1.2.55 NEWS
--- NEWS        18 Aug 2006 23:11:36 -0000      1.1.1.1.2.55
+++ NEWS        24 Aug 2006 14:20:20 -0000
@@ -34,6 +34,11 @@ Version 1.4.6 - ?? 2006, by ??  (CVS ver
   messages have been reformatted to comply with GNU Coding Standards.
 * The errprint, m4wrap, and shift macros are now recognized only with
   arguments.
+* The index, substr, translit, regexp, and patsubst macros now produce
+  output when given only one argument, but still warn about a missing
+  second argument.
+* The patsubst macro now reliably finds zero-length matches at the end
+  of a string.
 
 Version 1.4.5 - 15 July 2006, by Eric Blake  (CVS version 1.4.4c)
 
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.69
diff -u -p -r1.1.1.1.2.69 m4.texinfo
--- doc/m4.texinfo      22 Aug 2006 19:56:09 -0000      1.1.1.1.2.69
+++ doc/m4.texinfo      24 Aug 2006 14:20:21 -0000
@@ -151,8 +151,8 @@ Lexical and syntactic conventions
 
 * Names::                       Macro names
 * Quoted strings::              Quoting input to m4
-* Other tokens::                Other kinds of input tokens
 * Comments::                    Comments in m4 input
+* Other tokens::                Other kinds of input tokens
 * Input processing::            How m4 copies input to output
 
 How to invoke macros
@@ -604,7 +604,7 @@ options.
 @node Bugs
 @section Problems and bugs
 
-If you have problems with @acronym{GNU} @code{m4} or think you've found a bug,
+If you have problems with @acronym{GNU} M4 or think you've found a bug,
 please report it.  Before reporting a bug, make sure you've actually
 found a real bug.  Carefully reread the documentation and see if it
 really says you can do what you're trying to do.  If it's not clear
@@ -713,8 +713,8 @@ exception of the @sc{nul} character (the
 @menu
 * Names::                       Macro names
 * Quoted strings::              Quoting input to m4
-* Other tokens::                Other kinds of input tokens
 * Comments::                    Comments in m4 input
+* Other tokens::                Other kinds of input tokens
 * Input processing::            How m4 copies input to output
 @end menu
 
@@ -734,7 +734,8 @@ Examples of legal names are: @samp{foo},
 @section Quoted strings
 
 @cindex quoted string
-A quoted string is a sequence of characters surrounded by the quotes
+A quoted string is a sequence of characters surrounded by quote
+strings, defaulting to
 @kbd{`} and @kbd{'}, where the nested begin and end quotes within the
 string are balanced.  The value of a string token is the text, with one
 level of quotes stripped off.  Thus
@@ -757,12 +758,6 @@ is the empty string, and double-quoting 
 The quote characters can be changed at any time, using the builtin macro
 @code{changequote}.  @xref{Changequote}, for more information.
 
address@hidden Other tokens
address@hidden Other tokens
-
-Any character, that is neither a part of a name, nor of a quoted string,
-is a token by itself.
-
 @node Comments
 @section Comments
 
@@ -787,6 +782,17 @@ The comment delimiters can be changed to
 the builtin macro @code{changecom}.  @xref{Changecom}, for more
 information.
 
address@hidden Other tokens
address@hidden Other tokens
+
+Any character, that is neither a part of a name, nor of a quoted string,
+nor a comment, is a token by itself.  When not in the context of macro
+expansion, all of these tokens are just copied to output.  However,
+during macro expansion, whitespace characters (space, tab, newline,
+formfeed, carriage return, vertical tab), parentheses (@samp{(} and
address@hidden)}), comma (@samp{,}), and dollar (@samp{$}) have additional
+roles, explained later.
+
 @node Input processing
 @section Input Processing
 
@@ -801,7 +807,8 @@ call will be read and parsed into tokens
 
 @code{m4} expands a macro as soon as possible.  If it finds a macro call
 when collecting the arguments to another, it will expand the second
-call first.  If the input is
+call first.  For a running example, examine how @code{m4} handles this
+input:
 
 @comment ignore
 @example
@@ -809,8 +816,16 @@ format(`Result is %d', eval(`2**15'))
 @end example
 
 @noindent
address@hidden will first expand @samp{eval(2**15)} to @samp{32768}, and only
-then expand the resulting call
+First, @code{m4} sees that the token @samp{format} is a macro name, so
+it collects the tokens @samp{(}, @samp{`Result is %d'}, @samp{,},
+and @address@hidden }}, before encountering another potential macro.  Sure
+enough, @samp{eval} is a macro name, so the nested argument collection
+picks up @samp{(}, @samp{`2**15'}, and @samp{)}, invoking the eval macro
+with the lone argument of @samp{2**15}.  The expansion of
address@hidden(2**15)} is @samp{32768}, which is then rescanned as the five
+tokens @samp{3}, @samp{2}, @samp{7}, @samp{6}, and @samp{8}; and
+combined with the next @samp{)}, the format macro now has all its
+arguments, as if the user had typed:
 
 @comment ignore
 @example
@@ -818,11 +833,14 @@ format(`Result is %d', 32768)
 @end example
 
 @noindent
-which will give the output
+The format macro expands to @samp{Result is 32768}, and we have another
+round of scanning for the tokens @samp{Result}, @address@hidden }},
address@hidden, @address@hidden }}, @samp{3}, @samp{2}, @samp{7}, @samp{6}, and
address@hidden  None of these are macros, so the final output is
 
 @comment ignore
 @example
-Result is 32768
address@hidden is 32768
 @end example
 
 The order in which @code{m4} expands the macros can be explored using
@@ -911,6 +929,9 @@ by @samp{m4_} for them to be recognized.
 whatsoever on user defined macros.  For example, with this option,
 one has to write @code{m4_dnl} and even @code{m4_m4exit}.
 
+Another alternative is to redefine problematic macros to a name less
+likely to cause conflicts, @xref{Definitions}.
+
 If your version of @acronym{GNU} @code{m4} has the @code{changeword} feature
 compiled in, it offers far more flexibility in specifying the
 syntax of macro names, both builtin or user-defined.  @xref{Changeword},
@@ -1010,6 +1031,7 @@ supplied, the missing arguments are take
 However, some builtins are documented to behave differently for a
 missing optional argument than for an explicit empty string.  If
 there are too many arguments, the excess arguments are ignored.
+Unquoted leading whitespace is stripped off all arguments.
 
 Normally @code{m4} will issue warnings if a builtin macro is called
 with an inappropriate number of arguments, but it can be suppressed with
@@ -3313,6 +3335,14 @@ index(`gnus, gnats, and armadillos', `da
 @result{}-1
 @end example
 
+Omitting @var{substring} evokes a warning, but still produces output.
+
address@hidden
+index(`abc')
address@hidden:stdin:1: Warning: too few arguments to builtin `index'
address@hidden
address@hidden example
+
 @node Regexp
 @section Searching for regular expressions
 
@@ -3378,6 +3408,14 @@ regexp(`abc', `\(\(d\)?\)\(c\)', `\1\2\3
 @result{}c
 @end example
 
+Omitting @var{regexp} evokes a warning, but still produces output.
+
address@hidden
+regexp(`abc')
address@hidden:stdin:1: Warning: too few arguments to builtin `regexp'
address@hidden
address@hidden example
+
 @node Substr
 @section Extracting substrings
 
@@ -3403,6 +3441,17 @@ substr(`gnus, gnats, and armadillos', `6
 @result{}gnats
 @end example
 
+Omitting @var{from} evokes a warning, but still produces output.
+
address@hidden
+substr(`abc')
address@hidden:stdin:1: Warning: too few arguments to builtin `substr'
address@hidden
+substr(`abc',)
address@hidden:stdin:2: empty string treated as 0 in builtin `substr'
address@hidden
address@hidden example
+
 @node Translit
 @section Translating characters
 
@@ -3447,6 +3496,14 @@ lowercase to uppercase, and the third `m
 while converting them to lowercase.  The two first cases are by far the
 most common.
 
+Omitting @var{chars} evokes a warning, but still produces output.
+
address@hidden
+translit(`abc')
address@hidden:stdin:1: Warning: too few arguments to builtin `translit'
address@hidden
address@hidden example
+
 @node Patsubst
 @section Substituting text by regular expression
 
@@ -3489,7 +3546,7 @@ patsubst(`GNUs not Unix', `^', `OBS: ')
 patsubst(`GNUs not Unix', `\<', `OBS: ')
 @result{}OBS: GNUs OBS: not OBS: Unix
 patsubst(`GNUs not Unix', `\w*', `(\&)')
address@hidden(GNUs)() (not)() (Unix)
address@hidden(GNUs)() (not)() (Unix)()
 patsubst(`GNUs not Unix', `\w+', `(\&)')
 @result{}(GNUs) (not) (Unix)
 patsubst(`GNUs not Unix', `[A-Z][a-z]+')
@@ -3540,6 +3597,14 @@ patreg(`aba abb 121', `\(.\)\(.\)\1', `\
 @result{}bab
 @end example
 
+Omitting @var{regexp} evokes a warning, but still produces output.
+
address@hidden
+patsubst(`abc')
address@hidden:stdin:1: Warning: too few arguments to builtin `patsubst'
address@hidden
address@hidden example
+
 @node Format
 @section Formatted output
 
@@ -4020,7 +4085,8 @@ signal number shifted left by eight bits
 @example
 dnl This test assumes kill is a shell builtin, and that signals are
 dnl recognizable.
-ifdef(`__unix__', , `errprint(` skipping: syscmd does not have unix semantics
+ifdef(`__unix__', ,
+      `errprint(` skipping: syscmd does not have unix semantics
 ')m4exit(`77')')dnl
 syscmd(`kill -13 $$')
 @result{}
Index: src/builtin.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/builtin.c,v
retrieving revision 1.1.1.1.2.36
diff -u -p -r1.1.1.1.2.36 builtin.c
--- src/builtin.c       18 Aug 2006 23:11:36 -0000      1.1.1.1.2.36
+++ src/builtin.c       24 Aug 2006 14:20:21 -0000
@@ -1475,7 +1475,12 @@ m4_index (struct obstack *obs, int argc,
   int l1, l2, retval;
 
   if (bad_argc (argv[0], argc, 3, 3))
-    return;
+    {
+      /* builtin(`index') is blank, but index(`abc') is 0.  */
+      if (argc == 2)
+       shipout_int (obs, 0);
+      return;
+    }
 
   l1 = strlen (ARG (1));
   l2 = strlen (ARG (2));
@@ -1506,7 +1511,12 @@ m4_substr (struct obstack *obs, int argc
   int length, avail;
 
   if (bad_argc (argv[0], argc, 3, 4))
-    return;
+    {
+      /* builtin(`substr') is blank, but substr(`abc') is abc.  */
+      if (argc == 2)
+       obstack_grow (obs, ARG (1), strlen (ARG (1)));
+      return;
+    }
 
   length = avail = strlen (ARG (1));
   if (!numeric_arg (argv[0], ARG (2), &start))
@@ -1584,7 +1594,12 @@ m4_translit (struct obstack *obs, int ar
   int tolen;
 
   if (bad_argc (argv[0], argc, 3, 4))
-    return;
+    {
+      /* builtin(`translit') is blank, but translit(`abc') is abc.  */
+      if (argc == 2)
+       obstack_grow (obs, ARG (1), strlen (ARG (1)));
+      return;
+    }
 
   from = ARG (2);
   if (strchr (from, '-') != NULL)
@@ -1750,7 +1765,12 @@ m4_regexp (struct obstack *obs, int argc
   int length;                  /* length of first argument */
 
   if (bad_argc (argv[0], argc, 3, 4))
-    return;
+    {
+      /* builtin(`regexp') is blank, but regexp(`abc') is 0.  */
+      if (argc == 2)
+       shipout_int (obs, 0);
+      return;
+    }
 
   victim = TOKEN_DATA_TEXT (argv[1]);
   regexp = TOKEN_DATA_TEXT (argv[2]);
@@ -1806,7 +1826,12 @@ m4_patsubst (struct obstack *obs, int ar
   int length;                  /* length of first argument */
 
   if (bad_argc (argv[0], argc, 3, 4))
-    return;
+    {
+      /* builtin(`patsubst') is blank, but patsubst(`abc') is abc.  */
+      if (argc == 2)
+       obstack_grow (obs, ARG (1), strlen (ARG (1)));
+      return;
+    }
 
   regexp = TOKEN_DATA_TEXT (argv[2]);
 
@@ -1826,7 +1851,7 @@ m4_patsubst (struct obstack *obs, int ar
 
   offset = 0;
   matchpos = 0;
-  while (offset < length)
+  while (offset <= length)
     {
       matchpos = re_search (&buf, victim, length,
                            offset, length - offset, &regs);

reply via email to

[Prev in Thread] Current Thread [Next in Thread]