m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: M4 syntax $11 vs. ${11}


From: Eric Blake
Subject: Re: M4 syntax $11 vs. ${11}
Date: Sat, 27 Jan 2007 18:53:23 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.9) Gecko/20061207 Thunderbird/1.5.0.9 Mnenhy/0.7.4.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Paul Eggert on 1/20/2007 12:43 AM:
> Eric Blake <address@hidden> writes:
> 
>> +  /* This warning must not kill m4 -E, or it will break autoconf.  */
>> +  if (text && strstr (text, "${"))
>> +    M4ERROR ((0, 0, "Warning: raw `${' in defn of %s will change semantics",
>> +          name));
> 
> This warning will generate a lot of false positives, right?
> Most of the time, a stray ${ in an M4 file won't be followed
> by a series of digits and then a }.  So it will be treated
> as itself (for backward compatibility).

OK, I toned down my patch on the M4 side of things.  Originally, the patch
warned for the two-character sequence ${, since I was planning that even
${foo} could have meaning in M4 2.0 (as the current definition of foo),
but we can save that for M4 2.1 and a long transition period.  For m4 2.0,
if ${ is followed by a non-digit, then I will be sure to stick with the
old behavior of literal output.  This greatly reduces (but not eliminates)
the number of places in autoconf that need extra quoting; I'll follow up
with a patch to autoconf along those lines.

It is also possible in 2.0 to disable ${} handling, using the changesyntax
builtin to assign { and } back to the ordinary character category, at the
expense of no longer being able to refer to more than 9 arguments to a
macro.  My patch to autoconf will include an action along those lines, so
that no matter how fancy M4 2.0 actually becomes when handling ${}, it is
possible for autoconf to ignore that new feature for the sake of the large
existing codebase of macros that use raw ${.

Meanwhile, this particular patch is only for the 1.4.x branch, and I'm
going ahead and committing it.  I hope it is the last patch prior to
1.4.9, although this week's changes in gnulib regarding <string.h> need to
stabilize first.  It adds the --warn-syntax option (off by default) in
order to detect uses of the three-character sequences $<digit><digit>
(which will change to the one-digit argumented concatenated with the
second digit rather than a multi-digit argument; I doubt much code tickles
this) as well as uses of ${<digit> (common when generating shell or
Makefile code; I doubt there are many false positives where a close }
cannot be found, so the warning is simplified by not looking for it).  I
will be using this patch to find the problem spots in autoconf; I already
know that m4 1.4.9 + autoconf 2.61 will trigger the warning (and since
autom4te runs m4 -E, it is fatal to autoconf), so this patch is careful to
document that issue.  Hopefully, autoconf 2.62 will be immune from this
warning.

2007-01-27  Eric Blake  <address@hidden>

        * src/m4.h (warn_syntax): Declare.
        (init_pattern_buffer): Export.
        * src/m4.c (warn_syntax, usage, WARN_SYNTAX_OPTIONS)
        (long_options, main): Implement new option.
        * src/builtin.c (init_pattern_buffer): Allow NULL regs argument.
        (define_user_macro): Warn on $11 and ${1} if requested.
        * src/input.c (init_pattern_buffer): Delete duplicate method.
        * doc/m4.texinfo (Operation modes): Document it.
        (Arguments): Document future direction of ${11} vs. $11.
        (Incompatibilities): Fix wording on POSIX limitations.
        * checks/get-them: Parse @{ and @} correctly.
        * NEWS: Document this change.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFvAIS84KuGfSFAYARAj3PAJwKP/02NcMGixie5CcrW60H7qJigQCg1KzX
4JCSiWLw8Upnu5wY6UNSdjE=
=yJW3
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.90
diff -u -p -r1.1.1.1.2.90 NEWS
--- NEWS        15 Jan 2007 13:51:33 -0000      1.1.1.1.2.90
+++ NEWS        28 Jan 2007 01:50:28 -0000
@@ -15,6 +15,14 @@ Version 1.4.9 - ?? ??? 2007, by ????  (C
   of variable assignment as an extension.
 * The `include' builtin now affects exit status on failure, as required by
   POSIX.  Use `sinclude' if you need a successful exit status.
+* A new `--warn-syntax' command-line option allows detection of
+  non-portable syntax that might be broken when upgrading to M4 2.0.  For
+  example, POSIX requires a macro definition containing `$11' to expand to
+  the first argument concatenated with 1, rather than the eleventh
+  argument; and allows implementations to choose whether `${11}' is treated
+  as literal text, as in M4 1.4.x, or as the eleventh argument, as in the
+  eventual M4 2.0.  Be aware that Autoconf 2.61 will not work with this
+  option enabled.
 * Improved portability to platforms such as BSD/OS.
 
 Version 1.4.8 - 20 November 2006, by Eric Blake  (CVS version 1.4.7a)
Index: checks/get-them
===================================================================
RCS file: /sources/m4/m4/checks/Attic/get-them,v
retrieving revision 1.1.1.1.2.8
diff -u -p -r1.1.1.1.2.8 get-them
--- checks/get-them     6 Jan 2007 19:56:11 -0000       1.1.1.1.2.8
+++ checks/get-them     28 Jan 2007 01:50:28 -0000
@@ -73,6 +73,8 @@ BEGIN {
   else
     prefix = "";
   gsub("@@", "@", $0);
+  gsub("@{", "{", $0);
+  gsub("@}", "}", $0);
   gsub("@w{ }", " ", $0);
   gsub("@tabchar{}", "\t", $0);
   printf("%s%s\n", prefix, $0) >> file;
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.108
diff -u -p -r1.1.1.1.2.108 m4.texinfo
--- doc/m4.texinfo      15 Jan 2007 13:51:33 -0000      1.1.1.1.2.108
+++ doc/m4.texinfo      28 Jan 2007 01:50:29 -0000
@@ -577,6 +577,13 @@ is also specified.
 Suppress warnings, such as missing or superfluous arguments in macro
 calls, or treating the empty string as zero.
 
address@hidden --warn-syntax
+Issue warnings when syntax is encountered that will change semantics in
address@hidden M4 2.0.  For now, the only semantics that will change have
+to do with how more than 9 arguments in a macro definition are handled
+(@pxref{Arguments}).  This warning is disabled by default because it
+triggers spurious failures in @acronym{GNU} Autoconf 2.61.
+
 @item -W @var{REGEXP}
 @itemx address@hidden
 Use @var{REGEXP} as an alternative syntax for macro names.  This
@@ -1354,8 +1361,8 @@ As a @acronym{GNU} extension, the first 
 not have to be a simple word.
 It can be any text string, even the empty string.  A macro with a
 non-standard name cannot be invoked in the normal way, as the name is
-not recognized.  It can only be referenced by the builtins @code{Indir}
-(@pxref{Indir}) and @code{Defn} (@pxref{Defn}).
+not recognized.  It can only be referenced by the builtins @code{indir}
+(@pxref{Indir}) and @code{defn} (@pxref{Defn}).
 
 @cindex arrays
 Arrays and associative arrays can be simulated by using this trick.
@@ -1375,7 +1382,7 @@ array(eval(`10 + 7'))
 @result{}array element no. 17
 @end example
 
-Change the @code{%d} to @code{%s} and it is an associative array.
+Change the @samp{%d} to @samp{%s} and it is an associative array.
 
 @node Arguments
 @section Arguments to macros
@@ -1412,13 +1419,6 @@ macro
 (You should try and improve this example so that clients of @code{exch}
 do not have to double quote; or @pxref{Improved exch, , Answers}).
 
address@hidden @acronym{GNU} extensions
address@hidden @code{m4} allows the number following the @samp{$} to
-consist of one
-or more digits, allowing macros to have any number of arguments.  This
-is not so in UNIX implementations of @code{m4}, which only recognize
-one digit.
-
 As a special case, the zeroth argument, @code{$0}, is always the name
 of the macro being expanded.
 
@@ -1443,6 +1443,51 @@ foo
 The @samp{foo} in the expansion text is @emph{not} expanded, since it is
 a quoted string, and not a name.
 
address@hidden @acronym{GNU} extensions
address@hidden nine arguments, more than
address@hidden more than nine arguments
address@hidden arguments, more than nine
address@hidden positional parameters, more than nine
address@hidden @code{m4} allows the number following the @samp{$} to
+consist of one or more digits, allowing macros to have any number of
+arguments.  The extension of accepting multiple digits is incompatible
+with @acronym{POSIX}, and is different than traditional implementations
+of @code{m4}, which only recognize one digit.  Therefore, future
+versions of @acronym{GNU} M4 will phase out this feature.
address@hidden, for an example of how to portably access the eleventh
+argument.
+
address@hidden also states that @samp{$} followed immediately by
address@hidden@{} in a macro definition is implementation-defined.  This version
+of M4 passes the literal characters @address@hidden through unchanged, but M4
+2.0 will implement an optional feature similar to @command{sh}, where
address@hidden@address@hidden expands to the eleventh argument, to replace the 
current
+recognition of @samp{$11}.  Meanwhile, if you want to guarantee that you
+will get a literal @address@hidden in output when expanding a macro, even
+when you upgrade to M4 2.0, you can use nested quoting to your
+advantage:
+
address@hidden
+define(`foo', `single quoted $`'@address@hidden output')
address@hidden
+define(`bar', ``double quoted $'address@hidden@} output'')
address@hidden
+foo(`a', `b')
address@hidden quoted address@hidden@} output
+bar(`a', `b')
address@hidden quoted address@hidden@} output
address@hidden example
+
+To help you detect places in your M4 input files that might change in
+behavior due to the changed behavior of M4 2.0, you can use the
address@hidden command-line option (@pxref{Operation modes, ,
+Invoking m4}).  This will add a warning any time a macro definition
+includes @samp{$} followed by multiple digits, or by @address@hidden and a
+digit.  The warning is not enabled by default, because it triggers a
+number of warnings in Autoconf 2.61 (and Autoconf uses @option{-E} to
+treat warnings as errors), and because it will still be possible to
+restore traditional behavior in M4 2.0.
+
 @node Pseudo Arguments
 @section Special arguments to macros
 
@@ -2588,7 +2633,7 @@ foo
 @result{}blah
 @end example
 
-Tracing even works on builtins.  However, @command{defn} (@pxref{Defn})
+Tracing even works on builtins.  However, @code{defn} (@pxref{Defn})
 does not transfer tracing status.
 
 @example
@@ -4721,10 +4766,10 @@ There are a few builtin macros in @code{
 commands from within @code{m4}.
 
 Note that the definition of a valid shell command is system dependent.
-On UNIX systems, this is the typical @code{/bin/sh}.  But on other
+On UNIX systems, this is the typical @command{/bin/sh}.  But on other
 systems, such as native Windows, the shell has a different syntax of
 commands that it understands.  Some examples in this chapter assume
address@hidden/bin/sh}, and also demonstrate how to quit early with a known
address@hidden/bin/sh}, and also demonstrate how to quit early with a known
 exit value if this is not the case.
 
 @menu
@@ -4934,7 +4979,7 @@ sysval
 @result{}0
 @end example
 
address@hidden results in 127 if there was a problem executing the
address@hidden results in 127 if there was a problem executing the
 command, for example, if the system-imposed argument length is exceeded,
 or if there were not enough resources to fork.  It is not possible to
 distinguish between failed execution and successful execution that had
@@ -5262,8 +5307,8 @@ which files are listed on each @code{m4}
 user's input file, or else each input file uses @code{include}.
 
 Reading the common base of a big application, over and over again, may
-be time consuming.  @acronym{GNU} @code{m4} offers some machinery to speed up
-the start of an application using lengthy common bases.
+be time consuming.  @acronym{GNU} @code{m4} offers some machinery to
+speed up the start of an application using lengthy common bases.
 
 @menu
 * Using frozen files::          Using frozen files
@@ -5311,7 +5356,7 @@ with the varying input.  The first call,
 option, only reads and executes file @file{base.m4}, defining
 various application macros and computing other initializations.
 Once the input file @file{base.m4} has been completely processed, @acronym{GNU}
address@hidden produces on @file{base.m4f} a @dfn{frozen} file, that is, a
address@hidden produces in @file{base.m4f} a @dfn{frozen} file, that is, a
 file which contains a kind of snapshot of the @code{m4} internal state.
 
 Later calls, containing the @option{-R} option, are able to reload
@@ -5466,7 +5511,7 @@ Invoking m4}), unless overridden by othe
 
 @itemize @bullet
 @item
-In the @address@hidden notation for macro arguments, @var{n} can contain
+In the @address@hidden notation for macro arguments, @var{n} can contain
 several digits, while the System V @code{m4} only accepts one digit.
 This allows macros in @acronym{GNU} @code{m4} to take any number of
 arguments, and not only nine (@pxref{Arguments}).
@@ -5623,10 +5668,11 @@ m4wrap(`a`'m4wrap(`c
 @end example
 
 @item
address@hidden requires that all builtins that require arguments, but
-are called without arguments, behave as though empty strings had been
-passed.  For example, @code{a`'define`'b} would expand to @code{ab}.
-But @acronym{GNU} @code{m4} ignores certain builtins if they have missing
address@hidden states that builtins that require arguments, but are
+called without arguments, have undefined behavior.  Traditional
+implementations simply behave as though empty strings had been passed.
+For example, @code{a`'define`'b} would expand to @code{ab}.  But
address@hidden @code{m4} ignores certain builtins if they have missing
 arguments, giving @code{adefineb} for the above example.
 
 @item
Index: src/builtin.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/builtin.c,v
retrieving revision 1.1.1.1.2.55
diff -u -p -r1.1.1.1.2.55 builtin.c
--- src/builtin.c       27 Jan 2007 00:25:33 -0000      1.1.1.1.2.55
+++ src/builtin.c       28 Jan 2007 01:50:30 -0000
@@ -231,6 +231,7 @@ void
 define_user_macro (const char *name, const char *text, symbol_lookup mode)
 {
   symbol *s;
+  size_t len;
 
   s = lookup_symbol (name, mode);
   if (SYMBOL_TYPE (s) == TOKEN_TEXT)
@@ -238,6 +239,43 @@ define_user_macro (const char *name, con
 
   SYMBOL_TYPE (s) = TOKEN_TEXT;
   SYMBOL_TEXT (s) = xstrdup (text ? text : "");
+
+  /* In M4 2.0, $11 will mean the first argument concatenated with 1,
+     not the eleventh argument.  Also, ${1} will mean the first
+     argument, rather than literal text (although for compatibility
+     sake, it will be possible to restore the traditional meaning of
+     ${1} using changesyntax).  Needing more than 9 arguments is
+     somewhat rare, but using M4 to process shell code is quite
+     common; either way, warn on usages that will change in
+     semantics.  */
+  if (warn_syntax && text && (len = strlen (text)) >= 3)
+    {
+      static struct re_pattern_buffer buf;
+      static bool init = false;
+      regoff_t offset = 0;
+
+      if (! init)
+       {
+         const char *msg = "\\$[{0-9][0-9]";
+         init_pattern_buffer (&buf, NULL);
+         msg = re_compile_pattern (msg, strlen (msg), &buf);
+         if (msg != NULL)
+           {
+             M4ERROR ((EXIT_FAILURE, 0,
+                       "unable to check --warn-syntax: %s", msg));
+           }
+         init = true;
+       }
+      while ((offset = re_search (&buf, text, len, offset, len - offset,
+                                 NULL)) >= 0)
+       {
+         M4ERROR ((warning_status, 0,
+                   "Warning: semantics of `$%c%c%s' in `%s' will change",
+                   text[offset + 1], text[offset + 2],
+                   text[offset + 1] == '{' ? "...}" : "", name));
+         offset += 3;
+       }
+    }
 }
 
 /*-----------------------------------------------.
@@ -1828,15 +1866,18 @@ Warning: trailing \\ ignored in replacem
 | Initialize regular expression variables.  |
 `------------------------------------------*/
 
-static void
+void
 init_pattern_buffer (struct re_pattern_buffer *buf, struct re_registers *regs)
 {
   buf->translate = NULL;
   buf->fastmap = NULL;
   buf->buffer = NULL;
   buf->allocated = 0;
-  regs->start = NULL;
-  regs->end = NULL;
+  if (regs)
+    {
+      regs->start = NULL;
+      regs->end = NULL;
+    }
 }
 
 /*----------------------------------------.
Index: src/input.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/input.c,v
retrieving revision 1.1.1.1.2.32
diff -u -p -r1.1.1.1.2.32 input.c
--- src/input.c 1 Nov 2006 22:29:08 -0000       1.1.1.1.2.32
+++ src/input.c 28 Jan 2007 01:50:30 -0000
@@ -1,6 +1,6 @@
 /* GNU m4 -- A simple macro processor
 
-   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2004, 2005, 2006
+   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2004, 2005, 2006, 2007
    Free Software Foundation, Inc.
 
    This program is free software; you can redistribute it and/or modify
@@ -752,15 +752,6 @@ set_comment (const char *bc, const char 
 
 #ifdef ENABLE_CHANGEWORD
 
-static void
-init_pattern_buffer (struct re_pattern_buffer *buf)
-{
-  buf->translate = NULL;
-  buf->fastmap = NULL;
-  buf->buffer = NULL;
-  buf->allocated = 0;
-}
-
 void
 set_word_regexp (const char *regexp)
 {
@@ -776,7 +767,7 @@ set_word_regexp (const char *regexp)
     }
 
   /* Dry run to see whether the new expression is compilable.  */
-  init_pattern_buffer (&new_word_regexp);
+  init_pattern_buffer (&new_word_regexp, NULL);
   msg = re_compile_pattern (regexp, strlen (regexp), &new_word_regexp);
   regfree (&new_word_regexp);
 
Index: src/m4.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/m4.c,v
retrieving revision 1.1.1.1.2.41
diff -u -p -r1.1.1.1.2.41 m4.c
--- src/m4.c    5 Jan 2007 02:58:32 -0000       1.1.1.1.2.41
+++ src/m4.c    28 Jan 2007 01:50:30 -0000
@@ -55,6 +55,9 @@ int suppress_warnings = 0;
 /* If not zero, then value of exit status for warning diagnostics.  */
 int warning_status = 0;
 
+/* If true, then warn about usage of ${1} in macro definitions.  */
+bool warn_syntax = false;
+
 /* Artificial limit for expansion_level in macro.c.  */
 int nesting_limit = 1024;
 
@@ -142,10 +145,13 @@ for short options too.\n\
 Operation modes:\n\
       --help                   display this help and exit\n\
       --version                output version information and exit\n\
+", stdout);
+      fputs ("\
   -E, --fatal-warnings         stop execution after first warning\n\
   -i, --interactive            unbuffer output, ignore interrupts\n\
   -P, --prefix-builtins        force a `m4_' prefix to all builtins\n\
   -Q, --quiet, --silent        suppress some warnings for builtins\n\
+      --warn-syntax            warn on syntax that will change in future\n\
 ", stdout);
 #ifdef ENABLE_CHANGEWORD
       fputs ("\
@@ -221,6 +227,7 @@ enum
 {
   DEBUGFILE_OPTION = CHAR_MAX + 1,     /* no short opt */
   DIVERSIONS_OPTION,                   /* not quite -N, because of message */
+  WARN_SYNTAX_OPTION,                  /* no short opt */
 
   HELP_OPTION,                         /* no short opt */
   VERSION_OPTION                       /* no short opt */
@@ -250,6 +257,7 @@ static const struct option long_options[
 
   {"debugfile", required_argument, NULL, DEBUGFILE_OPTION},
   {"diversions", required_argument, NULL, DIVERSIONS_OPTION},
+  {"warn-syntax", no_argument, NULL, WARN_SYNTAX_OPTION},
 
   {"help", no_argument, NULL, HELP_OPTION},
   {"version", no_argument, NULL, VERSION_OPTION},
@@ -455,6 +463,10 @@ main (int argc, char *const *argv, char 
        debugfile = optarg;
        break;
 
+      case WARN_SYNTAX_OPTION:
+       warn_syntax = true;
+       break;
+
       case VERSION_OPTION:
        version_etc (stdout, PACKAGE, PACKAGE_NAME, VERSION, AUTHORS, NULL);
        exit (EXIT_SUCCESS);
Index: src/m4.h
===================================================================
RCS file: /sources/m4/m4/src/m4.h,v
retrieving revision 1.1.1.1.2.36
diff -u -p -r1.1.1.1.2.36 m4.h
--- src/m4.h    6 Jan 2007 19:56:11 -0000       1.1.1.1.2.36
+++ src/m4.h    28 Jan 2007 01:50:30 -0000
@@ -110,6 +110,7 @@ extern int max_debug_argument_length;       /*
 extern int suppress_warnings;          /* -Q */
 extern int warning_status;             /* -E */
 extern int nesting_limit;              /* -L */
+extern bool warn_syntax;               /* --warn-syntax */
 #ifdef ENABLE_CHANGEWORD
 extern const char *user_word_regexp;   /* -W */
 #endif
@@ -396,6 +397,8 @@ struct predefined
 
 typedef struct builtin builtin;
 typedef struct predefined predefined;
+struct re_pattern_buffer;
+struct re_registers;
 
 void builtin_init (void);
 void define_builtin (const char *, const builtin *, symbol_lookup);
@@ -403,6 +406,7 @@ void define_user_macro (const char *, co
 void undivert_all (void);
 void expand_user_macro (struct obstack *, symbol *, int, token_data **);
 void m4_placeholder (struct obstack *, int, token_data **);
+void init_pattern_buffer (struct re_pattern_buffer *, struct re_registers *);
 
 const builtin *find_builtin_by_addr (builtin_func *);
 const builtin *find_builtin_by_name (const char *);

reply via email to

[Prev in Thread] Current Thread [Next in Thread]