m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

comment precedence [was: branch-1_4 off-by-one in line reporting]


From: Eric Blake
Subject: comment precedence [was: branch-1_4 off-by-one in line reporting]
Date: Sat, 28 Oct 2006 17:20:32 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Thunderbird/1.5.0.7 Mnenhy/0.7.4.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Gary V. Vaughan on 10/17/2006 10:28 PM:
>>>>> I still want CVS head to follow Solaris' parsing precedence
>>>>> rules (macros, then quotes, then comments), rather than the current
>>>>> behavior (comments, macros, quotes).
>>>>
>>>> Can you remind me why that is?
>>>
> 
>>> If anything, the reason I am proposing delaying
>>> the recognition of comments until after macro names and quote starts
>>> have been
>>> recognized is to match historical behavior, and so that GNU M4 parsing
>>> at least
>>> follows the order that the three token types are mentioned in POSIX.
> 
> Thanks for the nice explanation.  Yes, I definitely agree with
> you now; can you add it to the TODO list, please?

I got started on this, but in the process found another bug in POSIX.
1.4.7 is strictly POSIX compliant on changecom(`#',), by taking the second
argument literally, but in so doing, it is no longer possible to recognize
the end of a comment.  All other implementations treat an empty second
argument specially (Solaris defaults it to newline, and BSD leaves the
previous end-comment delimiter unchanged), so that it is impossible to get
in the situation where a start comment can never be ended.  So I filed an
aardvark with the austin group, and am implementing the following change
on the branch first.

2006-10-28  Eric Blake  <address@hidden>

        * src/input.c (set_quotes): Don't allow empty end-quote with
        non-empty start-quote.
        (set_comment): Likewise for end-comment.
        * src/builtin.c (m4_changecom): Adjust caller.
        * doc/m4.texinfo (Changequote, Changecom): Update documentation to
        match behavior.
        (Incompatibilities): Document another POSIX bug.
        * NEWS: Mention this change.

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFQ+XA84KuGfSFAYARAvOqAKC9uBY4SPDI2L6eMIp/+LXrAPODtQCeOowW
tebNcXaVnHQGk1K83Up6Y/Q=
=xsTi
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.76
diff -u -p -r1.1.1.1.2.76 NEWS
--- NEWS        26 Oct 2006 04:56:32 -0000      1.1.1.1.2.76
+++ NEWS        28 Oct 2006 23:19:36 -0000
@@ -40,6 +40,9 @@ Version 1.4.8 - ?? ??? 2006, by ??  (CVS
   characters.
 * The manual has been improved, including a new section on a composite
   macro `foreach'.
+* The `changecom' and `changequote' macros now treat an empty second
+  argument the same as if it were missing, rather than using the empty
+  string and making it impossible to end a comment or quote.
 
 Version 1.4.7 - 25 September 2006, by Eric Blake  (CVS version 1.4.6a)
 
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.94
diff -u -p -r1.1.1.1.2.94 m4.texinfo
--- doc/m4.texinfo      26 Oct 2006 04:45:31 -0000      1.1.1.1.2.94
+++ doc/m4.texinfo      28 Oct 2006 23:19:37 -0000
@@ -2795,11 +2795,12 @@ The default quote delimiters can be chan
 @code{changequote}:
 
 @deffn Builtin changequote (@dvar{start, `}, @dvar{end, '})
-This sets @var{start} as the new begin-quote delimiter and @var{end} as the
-new end-quote delimiter.  If any of the arguments are missing, the default
-quotes (@code{`} and @code{'}) are used instead of the void arguments.
address@hidden FIXME POSIX requires that with one argument, the closing quote
address@hidden be set to newline, not '.
+This sets @var{start} as the new begin-quote delimiter and @var{end} as
+the new end-quote delimiter.  If both arguments are missing, the default
+quotes (@code{`} and @code{'}) are used.  If @var{start} is void, then
+quoting is disabled.  Otherwise, if @var{end} is missing or void, the
+default end-quote delimiter (@code{'}) is used.  The quote delimiters
+can be of any length.
 
 The expansion of @code{changequote} is void.
 @end deffn
@@ -2831,7 +2832,8 @@ changequote(`«', `»')
 @end example
 @end ignore
 If no single character is appropriate, @var{start} and @var{end} can be
-of any length.
+of any length.  Other implementations cap the delimiter length to five
+characters, but @acronym{GNU} has no inherent limit.
 
 @example
 changequote(`[[[', `]]]')
@@ -2842,18 +2844,32 @@ foo
 @result{}Macro [[foo]].
 @end example
 
-Changing the quotes to the empty strings will effectively disable the
-quoting mechanism, leaving no way to quote text.
+Calling @code{changequote} with @var{start} as the empty string will
+effectively disable the quoting mechanism, leaving no way to quote text.
+However, using an empty string is not portable, as some other
+implementations of @code{m4} revert to the default quoting, while others
+preserve the prior non-empty delimiter.  If @var{start} is not empty,
+then an empty @var{end} will use the default end-quote delimiter of
address@hidden'}, as otherwise, it would be impossible to end a quoted string.
+Again, this is not portable, as some other @code{m4} implementations
+reuse @var{start} as the end-quote delimiter, while others preserve the
+previous non-empty value.  Omitting both arguments restores the default
+begin-quote and end-quote delimiters; fortunately this behavior is
+portable to all implementations of @code{m4}.
 
 @example
 define(`foo', `Macro `FOO'.')
 @result{}
-changequote(, )
+changequote(`', `')
 @result{}
 foo
 @result{}Macro `FOO'.
 `foo'
 @result{}`Macro `FOO'.'
+changequote(`,)
address@hidden
+foo
address@hidden FOO.
 @end example
 
 There is no way in @code{m4} to quote a string containing an unmatched
@@ -2866,24 +2882,39 @@ calls of @code{changequote} must be made
 and one for the new quotes.
 
 Macros are recognized in preference to the begin-quote string, so if a
-prefix of @var{start} can be recognized as a potential macro name, the
-quoting mechanism is effectively disabled.  Unless you use
+prefix of @var{start} can be recognized as part of a potential macro
+name, the quoting mechanism is effectively disabled.  Unless you use
 @code{changeword} (@pxref{Changeword}), this means that @var{start}
-should not begin with a letter or @samp{_} (underscore).
+should not begin with a letter, digit, or @samp{_} (underscore).
+However, even though quoted strings are not recognized, the quote
+characters can still be discerned in macro expansion and in trace
+output.
 
 @example
+define(`echo', `$@@')
address@hidden
 define(`hi', `HI')
 @result{}
 changequote(`q', `Q')
 @result{}
 q hi Q hi
 @result{}q HI Q HI
+echo(hi)
address@hidden
 changequote
 @result{}
 changequote(`-', `EOF')
 @result{}
 - hi EOF hi
 @result{} hi  HI
+changequote
address@hidden
+changequote(`1', `2')
address@hidden
+hi1hi2
address@hidden
+hi 1hi2
address@hidden hi
 @end example
 
 Quotes are recognized in preference to argument collection.  In
@@ -2963,12 +2994,12 @@ It is an error if the end of file occurs
 The default comment delimiters can be changed with the builtin
 macro @code{changecom}:
 
address@hidden Builtin changecom (@ovar{start}, @ovar{end})
-This sets @var{start} as the new begin-comment delimiter and @var{end} as
-the new end-comment delimiter.  If only one argument is provided,
-newline becomes the new end-comment delimiter.  The comment delimiters
-can be of any length.  Omitting the first argument, or using the empty
-string as the first argument, disables comments.
address@hidden Builtin changecom (@ovar{start}, @dvar{end, @key{NL}})
+This sets @var{start} as the new begin-comment delimiter and @var{end}
+as the new end-comment delimiter.  If both arguments are missing, or
address@hidden is void, then comments are disabled.  Otherwise, if
address@hidden is missing or void, the default end-comment delimiter of
+newline is used.  The comment delimiters can be of any length.
 
 The expansion of @code{changecom} is void.
 @end deffn
@@ -2991,10 +3022,14 @@ Note how comments are copied to the outp
 strings.  If you want the text inside a comment expanded, quote the
 begin-comment delimiter.
 
-Calling @code{changecom} without any arguments, or with an empty string
-for the first argument, disables the commenting mechanism completely.
-To restore the original comment start of @samp{#}, you must explicitly
-ask for it.
+Calling @code{changecom} without any arguments, or with @var{start} as
+the empty string, will effectively disable the commenting mechanism.  To
+restore the original comment start of @samp{#}, you must explicitly ask
+for it.  If @var{start} is not empty, then an empty @var{end} will use
+the default end-comment delimiter of newline, as otherwise, it would be
+impossible to end a comment.  However, this is not portable, as some
+other @code{m4} implementations preserve the previous non-empty
+delimiters instead.
 
 @example
 define(`comment', `COMMENT')
@@ -3003,7 +3038,7 @@ changecom
 @result{}
 # Not a comment anymore
 @result{}# Not a COMMENT anymore
-changecom(`#')
+changecom(`#', `')
 @result{}
 # comment again
 @result{}# comment again
@@ -3026,21 +3061,33 @@ changecom(`«', `»')
 @result{}«a»
 @end example
 @end ignore
+If no single character is appropriate, @var{start} and @var{end} can be
+of any length.  Other implementations cap the delimiter length to five
+characters, but @acronym{GNU} has no inherent limit.
 
 Comments are recognized in preference to macros.  However, this is not
 compatible with other implementations, where macros and even quoting
 takes precedence over comments, so it may change in a future release.
 For portability, this means that @var{start} should not begin with a
-letter or @samp{_} (underscore), and that neither the start-quote nor
-the start-comment string should be a prefix of the other.
+letter, digit, or @samp{_} (underscore), and that neither the
+start-quote nor the start-comment string should be a prefix of the
+other.
 
 @example
 define(`hi', `HI')
 @result{}
+define(`hi1hi2', `hello')
address@hidden
 changecom(`q', `Q')
 @result{}
 q hi Q hi
 @result{}q hi Q HI
+changecom(`1', `2')
address@hidden
+hi1hi2
address@hidden
+hi 1hi2
address@hidden 1hi2
 @end example
 
 Comments are recognized in preference to argument collection.  In
@@ -5368,20 +5415,36 @@ provides the extension @code{esyscmd} th
 semantics.
 
 @item
address@hidden requires @code{changequote(@var{arg})}
-(@pxref{Changequote}) to use newline as the close quote, but @acronym{GNU}
address@hidden uses @samp{'} as the close quote.  Meanwhile, some
-traditional implementations use @var{arg} as the close quote, making it
-impossible to nest quotes.  For predictable results, never call
-changequote with just one argument.
+At one point, @acronym{POSIX} required @code{changequote(@var{arg})}
+(@pxref{Changequote}) to use newline as the close quote, but this was a
+bug, and the next version of @acronym{POSIX} is anticipated to state
+that using empty strings or just one argument is unspecified.
+Meanwhile, the @acronym{GNU} @code{m4} behavior of treating an empty
+end-quote delimiter as @samp{'} is not portable, as Solaris treats it as
+repeating the start-quote delimiter, and BSD treats it as leaving the
+previous end-quote delimiter unchanged.  For predictable results, never
+call changequote with just one argument, or with empty strings for
+arguments.
+
address@hidden
+At one point, @acronym{POSIX} required @code{changecom(@var{arg},)}
+(@pxref{Changecom}) to make it impossible to end a comment, but this is
+a bug, and the next version of @acronym{POSIX} is anticipated to state
+that using empty strings is unspecified.  Meanwhile, the @acronym{GNU}
address@hidden behavior of treating an empty end-comment delimiter as newline
+is not portable, as BSD treats it as leaving the previous end-comment
+delimiter unchanged.  It is also impossible in BSD implementations to
+disable comments, even though that is required by @acronym{POSIX}.  For
+predictable results, never call changecom with empty strings for
+arguments.
 
 @item
-Some implementations of @code{m4} give macros a higher precedence than
+Most implementations of @code{m4} give macros a higher precedence than
 comments when parsing, meaning that if the start delimiter given to
 @code{changecom} (@pxref{Changecom}) starts with a macro name, comments
 are effectively disabled.  @acronym{POSIX} does not specify what the
-precedence is, so the @acronym{GNU} @code{m4} parser recognizes comments, then
-macros, then quoted strings.
+precedence is, so the @acronym{GNU} @code{m4} parser recognizes
+comments, then macros, then quoted strings.
 
 @item
 Traditional implementations allow argument collection, but not string
Index: src/builtin.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/builtin.c,v
retrieving revision 1.1.1.1.2.46
diff -u -p -r1.1.1.1.2.46 builtin.c
--- src/builtin.c       26 Oct 2006 21:11:56 -0000      1.1.1.1.2.46
+++ src/builtin.c       28 Oct 2006 23:19:37 -0000
@@ -1146,6 +1146,7 @@ m4_changequote (struct obstack *obs, int
   if (bad_argc (argv[0], argc, 1, 3))
     return;
 
+  /* Explicit NULL distinguishes between empty and missing argument.  */
   set_quotes ((argc >= 2) ? TOKEN_DATA_TEXT (argv[1]) : NULL,
             (argc >= 3) ? TOKEN_DATA_TEXT (argv[2]) : NULL);
 }
@@ -1161,11 +1162,9 @@ m4_changecom (struct obstack *obs, int a
   if (bad_argc (argv[0], argc, 1, 3))
     return;
 
-  if (argc == 1)
-    set_comment ("", "");      /* disable comments */
-  else
-    set_comment (TOKEN_DATA_TEXT (argv[1]),
-               (argc >= 3) ? TOKEN_DATA_TEXT (argv[2]) : NULL);
+  /* Explicit NULL distinguishes between empty and missing argument.  */
+  set_comment ((argc >= 2) ? TOKEN_DATA_TEXT (argv[1]) : NULL,
+              (argc >= 3) ? TOKEN_DATA_TEXT (argv[2]) : NULL);
 }
 
 #ifdef ENABLE_CHANGEWORD
Index: src/input.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/input.c,v
retrieving revision 1.1.1.1.2.30
diff -u -p -r1.1.1.1.2.30 input.c
--- src/input.c 26 Oct 2006 14:54:23 -0000      1.1.1.1.2.30
+++ src/input.c 28 Oct 2006 23:19:37 -0000
@@ -693,10 +693,11 @@ input_init (void)
 }
 
 
-/*--------------------------------------------------------------.
-| Functions for setting quotes and comment delimiters.  Used by |
-| m4_changecom () and m4_changequote ().                       |
-`--------------------------------------------------------------*/
+/*------------------------------------------------------------------.
+| Functions for setting quotes and comment delimiters.  Used by            |
+| m4_changecom () and m4_changequote ().  Pass NULL if the argument |
+| was not present, to distinguish from an explicit empty string.    |
+`------------------------------------------------------------------*/
 
 void
 set_quotes (const char *lq, const char *rq)
@@ -704,9 +705,24 @@ set_quotes (const char *lq, const char *
   free (lquote.string);
   free (rquote.string);
 
-  lquote.string = xstrdup (lq ? lq : DEF_LQUOTE);
+  /* POSIX states that with 0 arguments, the default quotes are used.
+     POSIX XCU ERN 112 states that behavior is implementation-defined
+     if there was only one argument, or if there is an empty string in
+     either position when there are two arguments.  We allow an empty
+     left quote to disable quoting, but a non-empty left quote will
+     always create a non-empty right quote.  See the texinfo for what
+     some other implementations do.  */
+  if (!lq)
+    {
+      lq = DEF_LQUOTE;
+      rq = DEF_RQUOTE;
+    }
+  else if (!rq || (*lq && !*rq))
+    rq = DEF_RQUOTE;
+
+  lquote.string = xstrdup (lq);
   lquote.length = strlen (lquote.string);
-  rquote.string = xstrdup (rq ? rq : DEF_RQUOTE);
+  rquote.string = xstrdup (rq);
   rquote.length = strlen (rquote.string);
 }
 
@@ -716,9 +732,21 @@ set_comment (const char *bc, const char 
   free (bcomm.string);
   free (ecomm.string);
 
-  bcomm.string = xstrdup (bc ? bc : DEF_BCOMM);
+  /* POSIX requires no arguments to disable comments.  It requires
+     empty arguments to be used as-is, but this is counter to
+     traditional behavior, because a non-null begin and null end makes
+     it impossible to end a comment.  An aardvark has been filed:
+     http://www.opengroup.org/austin/mailarchives/ag-review/msg02168.html
+     This implementation assumes the aardvark will be approved.  See
+     the texinfo for what some other implementations do.  */
+  if (!bc)
+    bc = ec = "";
+  else if (!ec || (*bc && !*ec))
+    ec = DEF_ECOMM;
+
+  bcomm.string = xstrdup (bc);
   bcomm.length = strlen (bcomm.string);
-  ecomm.string = xstrdup (ec ? ec : DEF_ECOMM);
+  ecomm.string = xstrdup (ec);
   ecomm.length = strlen (ecomm.string);
 }
 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]