m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

change* cleanup


From: Eric Blake
Subject: change* cleanup
Date: Sat, 23 Dec 2006 00:00:17 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

I've been slowly developing this patch for a couple months now.  It cleans up 
syntax.c, and in the process makes changequote and changecom merged with branch 
behavior, makes changesyntax a little more usable, and changes the input engine 
to prefer macros and quotes over comments the way other implementations 
behave.  I didn't really see any way to break this into smaller commits, and it 
has undergone several tweaks as other patches have been brewing, so I think it 
is relatively stable now.

2006-12-22  Eric Blake  <address@hidden>

        * m4/m4module.h (m4_set_syntax): Change signature.
        * modules/gnu.c (m4_resyntax_encode_safe): Reduce error to
        warning.
        (changesyntax): Likewise, and update caller.
        * m4/m4private.h (m4_syntax_table): Add orig member.
        * m4/syntax.c (m4_set_quotes, m4_set_comment): Merge from branch.
        Don't set is_single_quotes and is_single_comments when the begin
        character is shadowed by another syntax type.
        (m4_syntax_create): Populate default syntax table.
        (add_syntax_attribute): Don't lose quote assignment.
        (remove_syntax_attribute): Only allow removing rquote or ecomm.
        (add_syntax_set, subtract_syntax_set, set_syntax_set)
        (reset_syntax_set): New helper routines.
        (m4_set_syntax): Alter semantics - NUL key reverts entire syntax
        to default, and empty chars reverts that key to default.
        (check_is_single_quotes, check_is_single_comments): New helper
        routines.
        * modules/m4.c (changecom): Merge from branch.
        * m4/input.c (m4__next_token): Rearrange token recognition order
        to macro, quote, comment, in order to match traditional
        implementations.
        * src/freeze.c (reload_frozen_state): Update caller.
        * doc/m4.texinfo (Changequote, Changecom): Merge from branch, with
        modifications.
        (Changeresyntax): Revise to match style of surrounding sections
        and add more examples.
        (Changesyntax): Likewise, and update to new semantics.
        * NEWS: Document this change.

Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.32
diff -u -r1.32 NEWS
--- NEWS        14 Nov 2006 05:58:01 -0000      1.32
+++ NEWS        22 Dec 2006 23:54:59 -0000
@@ -141,6 +141,14 @@
     efficient mapping directly to a builtin function, rather than through
     textual indirection through further expansions of `builtin'.
 
+*** The `changecom' builtin semantics now match traditional
+    implementations; if the start-comment string resembles a macro name or
+    the start-quote string, comments are effectively disabled.
+
+*** The `changesyntax' builtin has been improved, to make it easier to add
+    and remove characters from a syntax class without having to specify the
+    entire set of characters in that class.
+
 *** New `m' flag to `-d'/`--debug' option or `debugmode' macro traces
     actions related to module loading and unloading, and affects `dumpdef'
     and trace output to show where builtins come from.  New `s' flag shows
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.84
diff -u -r1.84 m4.texinfo
--- doc/m4.texinfo      22 Dec 2006 21:42:42 -0000      1.84
+++ doc/m4.texinfo      22 Dec 2006 23:54:59 -0000
@@ -3663,18 +3663,22 @@
 
 @cindex changing the quote delimiters
 @cindex quote delimiters, changing the
address@hidden {Builtin (m4)} changequote (@ovar{start}, @ovar{end})
 The default quote delimiters can be changed with the builtin
address@hidden, where @var{start} is the new start-quote delimiter
-and @var{end} is the new end-quote delimiter.  If any of the arguments
-are missing, the default quotes (@code{`} and @code{'}) are used instead
-of the void arguments.
address@hidden:
+
address@hidden {Builtin (m4)} changequote (@dvar{start, `}, @dvar{end, '})
+This sets @var{start} as the new begin-quote delimiter and @var{end} as
+the new end-quote delimiter.  If both arguments are missing, the default
+quotes (@code{`} and @code{'}) are used.  If @var{start} is void, then
+quoting is disabled.  Otherwise, if @var{end} is missing or void, the
+default end-quote delimiter (@code{'}) is used.  The quote delimiters
+can be of any length.
 
 The expansion of @code{changequote} is void.
 @end deffn
 
 @example
-changequote([, ])
+changequote(`[', `]')
 @result{}
 define([foo], [Macro [foo].])
 @result{}
@@ -3682,34 +3686,50 @@
 @result{}Macro foo.
 @end example
 
+The quotation strings can safely contain eight-bit characters.
 If no single character is appropriate, @var{start} and @var{end} can be
-of any length.
+of any length.  Other implementations cap the delimiter length to five
+characters, but @acronym{GNU} has no inherent limit.
 
 @example
-changequote([[, ]])
+changequote(`[[[', `]]]')
 @result{}
-define([[foo]], [[Macro [[[foo]]].]])
+define([[[foo]]], [[[Macro [[[[[foo]]]]].]]])
 @result{}
 foo
address@hidden [foo].
address@hidden [[foo]].
 @end example
 
-Changing the quotes to the empty strings will effectively disable the
-quoting mechanism, leaving no way to quote text.
+Calling @code{changequote} with @var{start} as the empty string will
+effectively disable the quoting mechanism, leaving no way to quote text.
+However, using an empty string is not portable, as some other
+implementations of @code{m4} revert to the default quoting, while others
+preserve the prior non-empty delimiter.  If @var{start} is not empty,
+then an empty @var{end} will use the default end-quote delimiter of
address@hidden'}, as otherwise, it would be impossible to end a quoted string.
+Again, this is not portable, as some other @code{m4} implementations
+reuse @var{start} as the end-quote delimiter, while others preserve the
+previous non-empty value.  Omitting both arguments restores the default
+begin-quote and end-quote delimiters; fortunately this behavior is
+portable to all implementations of @code{m4}.
 
 @example
 define(`foo', `Macro `FOO'.')
 @result{}
-changequote(, )
+changequote(`', `')
 @result{}
 foo
 @result{}Macro `FOO'.
 `foo'
 @result{}`Macro `FOO'.'
+changequote(`,)
address@hidden
+foo
address@hidden FOO.
 @end example
 
 There is no way in @code{m4} to quote a string containing an unmatched
-left quote, except using @code{changequote} to change the current
+begin-quote, except using @code{changequote} to change the current
 quotes.
 
 If the quotes should be changed from, say, @samp{[} to @samp{[[},
@@ -3717,25 +3737,126 @@
 calls of @code{changequote} must be made, one for the temporary quotes
 and one for the new quotes.
 
-Neither quote string should start with a letter or @samp{_} (underscore),
-as they will be confused with names in the input.  Doing so disables
-the quoting mechanism.
+Macros are recognized in preference to the begin-quote string, so if a
+prefix of @var{start} can be recognized as part of a potential macro
+name, the quoting mechanism is effectively disabled.  Unless you use
address@hidden (@pxref{Changesyntax}), this means that @var{start}
+should not begin with a letter, digit, or @samp{_} (underscore).
+However, even though quoted strings are not recognized, the quote
+characters can still be discerned in macro expansion and in trace
+output.
+
address@hidden
+define(`echo', `$@@')
address@hidden
+define(`hi', `HI')
address@hidden
+changequote(`q', `Q')
address@hidden
+q hi Q hi
address@hidden HI Q HI
+echo(hi)
address@hidden
+changequote
address@hidden
+changequote(`-', `EOF')
address@hidden
+- hi EOF hi
address@hidden hi  HI
+changequote
address@hidden
+changequote(`1', `2')
address@hidden
+hi1hi2
address@hidden
+hi 1hi2
address@hidden hi
address@hidden example
+
+Quotes are recognized in preference to argument collection.  In
+particular, if @var{start} is a single @samp{(}, then argument
+collection is effectively disabled.  For portability with other
+implementations, it is a good idea to avoid @samp{(}, @samp{,}, and
address@hidden)} as the first character in @var{start}.
+
address@hidden
+define(`echo', `$#:$@@:')
address@hidden
+define(`hi', `HI')
address@hidden
+changequote(`(',`)')
address@hidden
+echo(hi)
address@hidden::hi
+changequote
address@hidden
+changequote(`((', `))')
address@hidden
+echo(hi)
address@hidden:HI:
+echo((hi))
address@hidden::hi
+changequote
address@hidden
+changequote(`,', `)')
address@hidden
+echo(hi,hi)bye)
address@hidden:HIhibye:
address@hidden example
+
+If @var{end} is a prefix of @var{start}, the end-quote will be
+recognized in preference to a nested begin-quote.  In particular,
+changing the quotes to have the same string for @var{start} and
address@hidden disables nesting of quotes.  When quote nesting is disabled,
+it is impossible to double-quote strings across macro expansions, so
+using the same string is not done very often.
+
address@hidden
+define(`hi', `HI')
address@hidden
+changequote(`""', `"')
address@hidden
+""hi"""hi"
address@hidden
+""hi" ""hi"
address@hidden hi
+""hi"" "hi"
address@hidden" "HI"
+changequote
address@hidden
+`hi`hi'hi'
address@hidden'hi
+changequote(`"', `"')
address@hidden
+"hi"hi"hi"
address@hidden
address@hidden example
+
+It is an error if the end of file occurs within a quoted string.
+
address@hidden status: 1
address@hidden
+`hello world'
address@hidden world
+`dangling quote
+^D
address@hidden:stdin:2: end of file in string
address@hidden example
 
 @node Changecom
 @section Changing the comment delimiters
 
 @cindex changing comment delimiters
 @cindex comment delimiters, changing
address@hidden {Builtin (m4)} changecom (@ovar{start}, @ovar{end})
 The default comment delimiters can be changed with the builtin
-macro @code{changecom}, where @var{start} is the new start-comment
-delimiter and @var{end} is the new end-comment delimiter.  If any of the
-arguments are void, the default comment delimiters (@code{#} and
-newline) are used instead of the void arguments.  The comment delimiters
-can be of any length.
+macro @code{changecom}:
 
-Calling @code{changecom} without any arguments disables the commenting
-mechanism completely.
address@hidden {Builtin (m4)} changecom (@ovar{start}, @dvar{end, @key{NL}})
+This sets @var{start} as the new begin-comment delimiter and @var{end}
+as the new end-comment delimiter.  If both arguments are missing, or
address@hidden is void, then comments are disabled.  Otherwise, if
address@hidden is missing or void, the default end-comment delimiter of
+newline is used.  The comment delimiters can be of any length.
 
 The expansion of @code{changecom} is void.
 @end deffn
@@ -3758,6 +3879,15 @@
 strings.  If you want the text inside a comment expanded, quote the
 start comment delimiter.
 
+Calling @code{changecom} without any arguments, or with @var{start} as
+the empty string, will effectively disable the commenting mechanism.  To
+restore the original comment start of @samp{#}, you must explicitly ask
+for it.  If @var{start} is not empty, then an empty @var{end} will use
+the default end-comment delimiter of newline, as otherwise, it would be
+impossible to end a comment.  However, this is not portable, as some
+other @code{m4} implementations preserve the previous non-empty
+delimiters instead.
+
 @example
 define(`comment', `COMMENT')
 @result{}
@@ -3765,41 +3895,126 @@
 @result{}
 # Not a comment anymore
 @result{}# Not a COMMENT anymore
+changecom(`#', `')
address@hidden
+# comment again
address@hidden comment again
 @end example
 
address@hidden Changeresyntax
address@hidden Changing the regular expression syntax
+The comment strings can safely contain eight-bit characters.
+If no single character is appropriate, @var{start} and @var{end} can be
+of any length.  Other implementations cap the delimiter length to five
+characters, but @acronym{GNU} has no inherent limit.
 
address@hidden regular expression syntax, changing
address@hidden GNU extensions
address@hidden {Builtin (gnu)} changeresyntax (@var{resyntax})
-By default, the @acronym{GNU} extensions @code{patsubst}, @code{regexp} and
-more recently @code{renamesyms} continue to use emacs style regular
-expression syntax (@pxref{Regular expression syntax}).
-
-The @code{changeresyntax} macro expands to nothing, but changes the
-default regular expression syntax used by M4 according to the value of
address@hidden, equivalent to passing @var{resyntax} as the argument to
address@hidden when invoking @code{m4}.  @xref{Operation
-modes, , Invoking m4}, for more details.  If @var{resyntax} is empty or
-omitted the default settings are reverted to emacs style.
address@hidden deffn
-
-Any one of the values below, case is not important, and optionally
-with @kbd{-} or @kbd{ } substituted for @kbd{_} in the given names,
-will set the default regular expression syntax as described in the
-table below.  For example the following are all equivalent to
address@hidden:
+Macros and quotes are recognized in preference to comments, so if a
+prefix of @var{start} can be recognized as part of a potential macro
+name, or confused with a quoted string, the comment mechanism is
+effectively disabled.  Unless you use @code{changesyntax}
+(@pxref{Changesyntax}), this means that @var{start} should not begin
+with a letter, digit, or @samp{_} (underscore), and that neither the
+start-quote nor the start-comment string should be a prefix of the
+other.
 
 @example
-changeresyntax(`gnu m4')
+define(`hi', `HI')
 @result{}
-changeresyntax(`GNU-m4')
+define(`hi1hi2', `hello')
 @result{}
-changeresyntax(`Gnu_M4')
+changecom(`q', `Q')
address@hidden
+q hi Q hi
address@hidden HI Q HI
+changecom(`1', `2')
address@hidden
+hi1hi2
address@hidden
+hi 1hi2
address@hidden 1hi2
+changecom(`[[', `]]')
address@hidden
+changequote(`[[[', `]]]')
address@hidden
+[hi]
address@hidden
+[[hi]]
address@hidden
+[[[hi]]]
address@hidden
+changequote
address@hidden
+changecom(`[[[', `]]]')
address@hidden
+changequote(`[[', `]]')
address@hidden
+[[hi]]
address@hidden
+[[[hi]]]
address@hidden
address@hidden example
+
+Comments are recognized in preference to argument collection.  In
+particular, if @var{start} is a single @samp{(}, then argument
+collection is effectively disabled.  For portability with other
+implementations, it is a good idea to avoid @samp{(}, @samp{,}, and
address@hidden)} as the first character in @var{start}.
+
address@hidden
+define(`echo', `$#:$@@:')
address@hidden
+define(`hi', `HI')
address@hidden
+changecom(`(',`)')
address@hidden
+echo(hi)
address@hidden::(hi)
+changecom
 @result{}
+changecom(`((', `))')
address@hidden
+echo(hi)
address@hidden:HI:
+echo((hi))
address@hidden::((hi))
+changecom(`,', `)')
address@hidden
+echo(hi,hi)bye)
address@hidden:HI,hi)bye:
 @end example
 
+It is an error if the end of file occurs within a comment.
+
address@hidden status: 1
address@hidden
+changecom(`/*', `*/')
address@hidden
+/*dangling comment
+^D
address@hidden:stdin:2: end of file in comment
address@hidden example
+
address@hidden Changeresyntax
address@hidden Changing the regular expression syntax
+
address@hidden regular expression syntax, changing
address@hidden @acronym{GNU} extensions
+The @acronym{GNU} extensions @code{patsubst}, @code{regexp}, and more
+recently, @code{renamesyms} each deal with regular expressions.  There
+are multiple flavors of regular expressions, so the
address@hidden builtin exists to allow choosing the default
+flavor:
+
address@hidden {Builtin (gnu)} changeresyntax (@var{resyntax})
+Changes the default regular expression syntax used by M4 according to
+the value of @var{resyntax}, equivalent to passing @var{resyntax} as the
+argument to the command line option @option{--regexp-syntax}
+(@pxref{Operation modes, , Invoking m4}).  If @var{resyntax} is empty,
+the default flaver is reverted to emacs style.
+
address@hidden can be any one of the values in the table below.  Case is
+not important, and @kbd{-} or @kbd{ } can be substituted for @kbd{_} in
+the given names.  If @var{resyntax} is unrecognized, a warning is
+issued and the default flavor is not changed.
+
 @table @dfn
 @item AWK
 @xref{awk regular expression syntax}, for details.
@@ -3811,12 +4026,6 @@
 @xref{posix-basic regular expression syntax}, for details.
 
 @item BSD_M4
address@hidden regular expression syntax}, for details.
-
address@hidden EMACS
address@hidden GNU_EMACS
address@hidden regular expression syntax}, for details.
-
 @item EXTENDED
 @itemx POSIX_EXTENDED
 @xref{posix-extended regular expression syntax}, for details.
@@ -3830,7 +4039,10 @@
 @xref{egrep regular expression syntax}, for details.
 
 @item GNU_M4
address@hidden regular expression syntax}, for details.
address@hidden EMACS
address@hidden GNU_EMACS
address@hidden regular expression syntax}, for details.  This is the
+default regular expression flavor.
 
 @item GREP
 @xref{grep regular expression syntax}, for details.
@@ -3847,13 +4059,52 @@
 @xref{posix-egrep regular expression syntax}, for details.
 @end table
 
+The expansion of @code{changeresyntax} is void.
+The macro @code{changeresyntax} is recognized only with parameters.
+This macro was added in M4 2.0.
address@hidden deffn
+
+For an example of how @var{resyntax} is recognized, the first three
+usages select the @samp{GNU_M4} regular expression flavor:
+
address@hidden
+changeresyntax(`gnu m4')
address@hidden
+changeresyntax(`GNU-m4')
address@hidden
+changeresyntax(`Gnu_M4')
address@hidden
+changeresyntax(`unknown')
address@hidden:stdin:4: Warning: changeresyntax: bad syntax-spec: `unknown'
address@hidden
address@hidden example
+
+Using @code{changeresyntax} makes it possible to omit the optional
address@hidden parameter to other macros, while still using a different
+regular expression flavor.
+
address@hidden
+patsubst(`ab', `a|b', `c')
address@hidden
+patsubst(`ab', `a\|b', `c')
address@hidden
+patsubst(`ab', `a|b', `c', `EXTENDED')
address@hidden
+changeresyntax(`EXTENDED')
address@hidden
+patsubst(`ab', `a|b', `c')
address@hidden
+patsubst(`ab', `a\|b', `c')
address@hidden
address@hidden example
+
 @node Changesyntax
 @section Changing the lexical structure of the input
 
 @cindex lexical structure of the input
 @cindex input, lexical structure of the
 @cindex syntax table
address@hidden GNU extensions
address@hidden @acronym{GNU} extensions
 @quotation
 The macro @code{changesyntax} and all associated functionality is
 experimental (@pxref{Experiments}).  The functionality might change in
@@ -3861,13 +4112,13 @@
 do for bugs.
 @end quotation
 
-The input to @code{m4} is read character per character, and these
+The input to @code{m4} is read character by character, and these
 characters are grouped together to form input tokens (such as macro
 names, strings, comments, etc.).
 
 Each token is parsed according to certain rules.  For example, a macro
-name starts with a letter or @kbd{_} and consists of the longest
-possible string of letters, @kbd{_} and digits.  But who is to decide
+name starts with a letter or @samp{_} and consists of the longest
+possible string of letters, @samp{_} and digits.  But who is to decide
 what characters are letters, digits, quotes, white space?  Earlier the
 operating system decided, now you do.
 
@@ -3875,81 +4126,107 @@
 
 @table @dfn
 @item Letters
-Characters that start a macro name.  The default is the letters as
-defined by the operating system and the character @kbd{_}.
+Characters that start a macro name.  Defaults to the letters as defined
+by the locale, and the character @samp{_}.
 
 @item Digits
 Characters that, together with the letters, form the remainder of a
-macro name.  The default is the ten digits @address@hidden@kbd{9}.
+macro name.  Defaults to the ten digits @address@hidden@samp{9}, and any
+other digits defined by the locale.
 
 @item White space
-Characters that should be trimmed from the beginning of each
-argument to a macro call.  The default is @kbd{SPC}, @kbd{TAB},
address@hidden and possibly others as defined by the operating system.
+Characters that should be trimmed from the beginning of each argument to
+a macro call.  The defaults are space, tab, newline, carriage return,
+form feed, and vertical tab, and any others as defined by the locale.
 
 @item Open parenthesis
-Characters that open the argument list of a macro call.  Default
-is @kbd{(}.
+Characters that open the argument list of a macro call.  The default is
+the single character @samp{(}.
 
 @item Close parenthesis
-Characters that close the argument list of a macro call.  Default is
address@hidden)}.
+Characters that close the argument list of a macro call.  The default
+is the single character @samp{)}.
 
 @item Argument separator
-Characters that separate the arguments of a macro call.  Default
-is @kbd{,}.
+Characters that separate the arguments of a macro call.  The default is
+the single character @samp{,}.
 
 @item Dollar
 Characters that can introduce an argument reference in the body of a
-macro.  Default is @kbd{$}.
+macro.  The default is the single character @samp{$}.
+
address@hidden Left quote
+The set of characters that can start a single-character quoted string.
+The default is the single character @samp{`}.  For multiple-character
+quote delimiters, use @code{changequote} (@pxref{Changequote}).
+
address@hidden Begin comment
+The set of characters that can start a single-character comment.  The
+default is the single character @samp{#}.  For multiple-character
+comment delimiters, use @code{changecom} (@pxref{Changecom}).
 
 @item Other
 Characters that have no special syntactical meaning to @code{m4}.
-Default is all characters except those in the categories above.
+Defaults to all characters except those in the categories above.
 
 @item Active
-Characters that themselves, alone, form macro names.  No default.
+Characters that themselves, alone, form macro names.  This is a
address@hidden extension, and active characters have lower precedence
+than comments.  By default, no characters are active.
 
 @item Escape
-Characters that must precede macro names for them to be recognized.  No
-default.
-
+Characters that must precede macro names for them to be recognized.
+This is a @acronym{GNU} extension.  When an escape character is defined,
+then macros are not recognized unless the escape character is present;
+however, the macro name, visible by @samp{$0} in macro definitions, does
+not include the escape character.  By default, no characters are
+escapes.
+
address@hidden FIXME - we should also consider supporting:
address@hidden @item Ignore - characters that are ignored if they appear in
address@hidden the input; perhaps defaulting to '\0', category 'I'.
address@hidden @item Assign -character used in macro definitions for default
address@hidden variables, category '='.
 @end table
 
 @noindent
 Each character can, besides the basic syntax category, have some syntax
-attributes.  These are:
+attributes.  One reason these are attributes rather than categories is
+that end delimiters are never recognized except when searching for the
+end of a token triggered by a start delimiter; the end delimiter can
+have syntax properties of its own when it appears in isolation.  These
+attributes are:
 
 @table @dfn
address@hidden Left quote
-The characters that start a quoted string.  Default is @kbd{`}.  Basic
-syntax category is `Other'.
-
 @item Right quote
-The characters that end a quoted string.  Default is @kbd{'}.  Basic
-syntax category is `Other'.
-
address@hidden Begin comment
-The characters that begin a comment.  Default is @kbd{#}.  Basic syntax
-category is `Other'.
+The set of characters that can end a single-character quoted string.
+The default is the single character @samp{'}.  For multiple-character
+quote delimiters, use @code{changequote} (@pxref{Changequote}).  Note
+that @samp{'} also defaults to the syntax category `Other', when it
+appears in isolation.
 
 @item End comment
-The characters that end a comment.  Default is @kbd{newline}.  Basic
-syntax category is `White space'.
+The set of characters that can end a single-character commet.  The
+default is the single character @kbd{newline}.  For multiple-character
+comment delimiters, use @code{changecom} (@pxref{Changecom}).  Note that
+newline also defaults to the syntax category `White space', when it
+appears in isolation.
 @end table
 
address@hidden
-
 The builtin macro @code{changesyntax} is used to change the way
 @code{m4} parses the input stream into tokens.
 
 @deffn {Builtin (gnu)} changesyntax (@var{syntax-spec}, @dots{})
-The @var{syntax-spec} is a string, whose first character determines the
-syntax category of the other characters.   Character ranges are expanded
-as for @ref{Translit}.  If there are no other
-characters, @emph{all} characters are given the syntax code.
-
-The characters for the syntax categories are:
+Each @var{syntax-spec} is a two-part string.  The first part is a
+command, consisting of a single character describing a syntax category,
+and an optional one-character action.  The action can be @samp{-} to
+remove the listed characters from that category and reassign them to the
+`Other' category, @samp{=} to set the category to the listed characters
+and reassign all other characters previously in that category to
+`Other', or @samp{+} to add the listed characters to the category
+without affecting other characters.  If an action is not specified, but
+additional characters are present, then @samp{=} is assumed.  The
+case-insensitive characters for the syntax categories are:
 
 @table @kbd
 @item W
@@ -3980,32 +4257,54 @@
 Begin comment
 @item E
 End comment
address@hidden
address@hidden @item I
address@hidden Ignore
address@hidden @item =
address@hidden Assign
 @end table
 
-The expansion of @code{changesyntax} is void.
+The remaining characters of each @var{syntax-spec} form the set of
+characters to perform the action on for that syntax category.  Character
+ranges are expanded as for @code{translit} (@pxref{Translit}).  To start
+the character set with @samp{-}, @samp{+}, or @samp{=}, an action must
+be specified.
+
+If @var{syntax-spec} is just a category, and no action or characters
+were specified, then all characters in that category are reset to their
+default state.  A warning is issued if the category character is not
+valid.  If @var{syntax-spec} is the empty string, then all categories
+are reset to their default state.
 
-The builtin macro @code{changesyntax} is recognized only when given
-arguments.
+The expansion of @code{changesyntax} is void.
+The macro @code{changesyntax} is recognized only with parameters.  Use
+this macro with caution, as it is possible to change the syntax in such
+a way that no further macros can be recognized by @code{m4}.
+This macro was added in M4 2.0.
 @end deffn
 
address@hidden
-With @code{changesyntax} we can modify the meaning of a word.
+With @code{changesyntax} we can modify what characters form a word.
 
 @example
 define(`test.1', `TEST ONE')
 @result{}
 __file__
 @result{}stdin
-changesyntax(`O_', `W.')
+test.1
address@hidden
+changesyntax(`W+.', `W-_')
 @result{}
 __file__
 @result{}__file__
 test.1
 @result{}TEST ONE
+changesyntax(`W')
address@hidden
+__file__
address@hidden
+test.1
address@hidden
 @end example
 
address@hidden
 Another possibility is to change the syntax of a macro call.
 
 @example
@@ -4013,7 +4312,7 @@
 @result{}
 test(a, b, c)
 @result{}3
-changesyntax(`(<', `,|', `)>', `O(,)')
+changesyntax(`(<', `,|', `)>')
 @result{}
 test(a, b, c)
 @result{}0(a, b, c)
@@ -4021,22 +4320,21 @@
 @result{}3
 @end example
 
address@hidden
 Leading spaces are always removed from macro arguments in @code{m4}, but
-by changing the syntax categories we can avoid it.
+by changing the syntax categories we can avoid it.  The use of
address@hidden is an alternative to using a literal tab character.
 
 @example
 define(`test', `$1$2$3')
 @result{}
 test(`a', `b', `c')
 @result{}abc
-changesyntax(`O         ')
+changesyntax(`O 'format(`%c', `9'))
 @result{}
 test(a, b, c)
 @result{}a b c
 @end example
 
address@hidden
 It is possible to redefine the @samp{$} used to indicate macro arguments
 in user defined macros.
 
@@ -4065,7 +4363,7 @@
 @end example
 
 Macro calls can be given a @TeX{} or Texinfo like syntax using an
-escape.  If one or more characters are defined as escapes macro names
+escape.  If one or more characters are defined as escapes, macro names
 are only recognized if preceded by an escape character.
 
 If the escape is not followed by what is normally a word (a letter
@@ -4118,35 +4416,37 @@
 There is obviously an overlap with @code{changecom} and
 @code{changequote}.  Comment delimiters and quotes can now be defined in
 two different ways.  To avoid incompatibilities, if the quotes are set
-with @code{changequote}, all characters marked in the syntax table as
-quotes will be unmarked, leaving only one set of defined quotes as
-before.  Since the quotes are syntax attributes rather than syntax
-categories, the old quotes simply revert to their old category.  If the
-quotes are set with @code{changesyntax}, other characters marked as
-quotes are left untouched, resulting in at least two sets of quotes.
-This applies to comment delimiters as well, @emph{mutatis mutandis}.
+with @code{changequote}, all other characters marked in the syntax table
+as quotes will revert to their normal syntax categories, leaving only
+one set of defined quotes as before.  If the quotes are set with
address@hidden, it is possible to result in multiple sets of
+quotes.  This applies to comment delimiters as well, @emph{mutatis
+mutandis}.
 
 @example
 define(`test', `TEST')
 @result{}
-changesyntax(`L<', `R>')
+changesyntax(`L+<', `R+>')
 @result{}
 <test>
 @result{}test
-`test>
+`test'
 @result{}test
+[test]
address@hidden
 changequote(<[>, `]')
 @result{}
 <test>
 @result{}<TEST>
+`test'
address@hidden'
 [test]
 @result{}test
 @end example
 
address@hidden
-If categories, that form single character tokens, contain several
-characters, all are treated as equal.  Any open parenthesis will match
-any close parenthesis, etc.
+If several characters are assigned to a category that forms single
+character tokens, all such characters are treated as equal.  Any open
+parenthesis will match any close parenthesis, etc.
 
 @example
 changesyntax(`(@{<', `)@}>', `,;:', `O(,)')
@@ -4155,9 +4455,9 @@
 @result{}00001111
 @end example
 
address@hidden
-This is not so for long quotes, which cannot be matched by single
-character quote and vice versa.  The same goes for comment delimiters.
+On the other hand, a multi-character start-quote sequence, which can
+only be created by @code{changequote}, will only be matched by the
+corresponding end-quote sequence.  The same goes for comment delimiters.
 
 @example
 define(`test', `==$1==')
Index: m4/input.c
===================================================================
RCS file: /sources/m4/m4/m4/input.c,v
retrieving revision 1.58
diff -u -r1.58 input.c
--- m4/input.c  11 Nov 2006 16:21:25 -0000      1.58
+++ m4/input.c  22 Dec 2006 23:54:59 -0000
@@ -1037,41 +1037,7 @@
     file = m4_get_current_file (context);
     line = m4_get_current_line (context);
 
-    /* FIXME - other implementations, such as Solaris, parse macro
-       names, then quotes, then comments.  We should probably
-       rearrange this to match.  */
-    if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_BCOMM))
-      {                                        /* COMMENT, SHORT DELIM */
-       obstack_1grow (&token_stack, ch);
-       while ((ch = next_char (context, true)) != CHAR_EOF
-              && !m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_ECOMM))
-         obstack_1grow (&token_stack, ch);
-       if (ch != CHAR_EOF)
-         obstack_1grow (&token_stack, ch);
-       else
-         m4_error_at_line (context, EXIT_FAILURE, 0, file, line,
-                           _("end of file in comment"));
-       type = (m4_get_discard_comments_opt (context)
-               ? M4_TOKEN_NONE : M4_TOKEN_STRING);
-      }
-    else if (!m4_is_syntax_single_comments (M4SYNTAX)
-            && MATCH (context, ch, context->syntax->bcomm.string, true))
-      {                                        /* COMMENT, LONGER DELIM */
-       obstack_grow (&token_stack, context->syntax->bcomm.string,
-                     context->syntax->bcomm.length);
-       while ((ch = next_char (context, true)) != CHAR_EOF
-              && !MATCH (context, ch, context->syntax->ecomm.string, true))
-         obstack_1grow (&token_stack, ch);
-       if (ch != CHAR_EOF)
-         obstack_grow (&token_stack, context->syntax->ecomm.string,
-                       context->syntax->ecomm.length);
-       else
-         m4_error_at_line (context, EXIT_FAILURE, 0, file, line,
-                           _("end of file in comment"));
-       type = (m4_get_discard_comments_opt (context)
-               ? M4_TOKEN_NONE : M4_TOKEN_STRING);
-      }
-    else if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_ESCAPE))
+    if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_ESCAPE))
       {                                        /* ESCAPED WORD */
        obstack_1grow (&token_stack, ch);
        if ((ch = next_char (context, true)) != CHAR_EOF)
@@ -1147,6 +1113,37 @@
          }
        type = M4_TOKEN_STRING;
       }
+    else if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_BCOMM))
+      {                                        /* COMMENT, SHORT DELIM */
+       obstack_1grow (&token_stack, ch);
+       while ((ch = next_char (context, true)) != CHAR_EOF
+              && !m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_ECOMM))
+         obstack_1grow (&token_stack, ch);
+       if (ch != CHAR_EOF)
+         obstack_1grow (&token_stack, ch);
+       else
+         m4_error_at_line (context, EXIT_FAILURE, 0, file, line,
+                           _("end of file in comment"));
+       type = (m4_get_discard_comments_opt (context)
+               ? M4_TOKEN_NONE : M4_TOKEN_STRING);
+      }
+    else if (!m4_is_syntax_single_comments (M4SYNTAX)
+            && MATCH (context, ch, context->syntax->bcomm.string, true))
+      {                                        /* COMMENT, LONGER DELIM */
+       obstack_grow (&token_stack, context->syntax->bcomm.string,
+                     context->syntax->bcomm.length);
+       while ((ch = next_char (context, true)) != CHAR_EOF
+              && !MATCH (context, ch, context->syntax->ecomm.string, true))
+         obstack_1grow (&token_stack, ch);
+       if (ch != CHAR_EOF)
+         obstack_grow (&token_stack, context->syntax->ecomm.string,
+                       context->syntax->ecomm.length);
+       else
+         m4_error_at_line (context, EXIT_FAILURE, 0, file, line,
+                           _("end of file in comment"));
+       type = (m4_get_discard_comments_opt (context)
+               ? M4_TOKEN_NONE : M4_TOKEN_STRING);
+      }
     else if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_ACTIVE))
       {                                        /* ACTIVE CHARACTER */
        obstack_1grow (&token_stack, ch);
Index: m4/m4module.h
===================================================================
RCS file: /sources/m4/m4/m4/m4module.h,v
retrieving revision 1.101
diff -u -r1.101 m4module.h
--- m4/m4module.h       14 Nov 2006 05:58:01 -0000      1.101
+++ m4/m4module.h       22 Dec 2006 23:54:59 -0000
@@ -390,7 +390,7 @@
 
 extern void    m4_set_quotes   (m4_syntax_table*, const char*, const char*);
 extern void    m4_set_comment  (m4_syntax_table*, const char*, const char*);
-extern int     m4_set_syntax   (m4_syntax_table*, const char, const char*);
+extern int     m4_set_syntax   (m4_syntax_table*, char, char, const char*);
 
 
 
Index: m4/m4private.h
===================================================================
RCS file: /sources/m4/m4/m4/m4private.h,v
retrieving revision 1.73
diff -u -r1.73 m4private.h
--- m4/m4private.h      14 Nov 2006 05:58:01 -0000      1.73
+++ m4/m4private.h      22 Dec 2006 23:54:59 -0000
@@ -284,21 +284,25 @@
 } m4_string;
 
 struct m4_syntax_table {
-  /* Please read the comment at the top of input.c for details */
+  /* Please read the comment at the top of input.c for details.  table
+     holds the current syntax, and orig holds the default syntax.  */
   unsigned short table[CHAR_RETRY];
+  unsigned short orig[CHAR_RETRY];
 
   m4_string lquote;
   m4_string rquote;
   m4_string bcomm;
   m4_string ecomm;
 
-  /* true iff strlen(rquote) == strlen(lquote) == 1 */
+  /* True iff strlen(lquote) == strlen(rquote) == 1 and lquote is not
+     interfering with macro names.  */
   bool is_single_quotes;
 
-  /* true iff strlen(bcomm) == strlen(ecomm) == 1 */
+  /* True iff strlen(bcomm) == strlen(ecomm) == 1 and bcomm is not
+     interfering with macros or quotes.  */
   bool is_single_comments;
 
-  /* true iff some character has M4_SYNTAX_ESCAPE */
+  /* True iff some character has M4_SYNTAX_ESCAPE.  */
   bool is_macro_escaped;
 };
 
Index: m4/syntax.c
===================================================================
RCS file: /sources/m4/m4/m4/syntax.c,v
retrieving revision 1.18
diff -u -r1.18 syntax.c
--- m4/syntax.c 11 Nov 2006 16:21:25 -0000      1.18
+++ m4/syntax.c 22 Dec 2006 23:54:59 -0000
@@ -35,14 +35,13 @@
    M4_SYNTAX_OPEN      Open list of macro arguments
    M4_SYNTAX_CLOSE     Close list of macro arguments
    M4_SYNTAX_COMMA     Separates macro arguments
-   M4_SYNTAX_DOLLAR    *Indicates macro argument in user macros
+   M4_SYNTAX_DOLLAR    Indicates macro argument in user macros
    M4_SYNTAX_ACTIVE    This character is a macro name by itself
    M4_SYNTAX_ESCAPE    Use this character to prefix all macro names
-   M4_SYNTAX_ASSIGN    Used to assign defaults in parameter lists
+   M4_SYNTAX_ASSIGN    *Used to assign defaults in parameter lists
 
    M4_SYNTAX_ALPHA     Alphabetic characters (can start macro names)
-   M4_SYNTAX_NUM       Numeric characters
-   M4_SYNTAX_ALNUM     Alphanumeric characters (can form macro names)
+   M4_SYNTAX_NUM       Numeric characters (can form macro names)
 
    M4_SYNTAX_LQUOTE    A single characters left quote
    M4_SYNTAX_BCOMM     A single characters begin comment delimiter
@@ -76,10 +75,10 @@
    The precedence as implemented by next_token () is:
 
    M4_SYNTAX_IGNORE    *Filtered out below next_token ()
-   M4_SYNTAX_BCOMM     Reads all until M4_SYNTAX_ECOMM
    M4_SYNTAX_ESCAPE    Reads macro name iff set, else next
    M4_SYNTAX_ALPHA     Reads macro name
    M4_SYNTAX_LQUOTE    Reads all until balanced M4_SYNTAX_RQUOTE
+   M4_SYNTAX_BCOMM     Reads all until M4_SYNTAX_ECOMM
 
    M4_SYNTAX_OTHER  }  Reads all M4_SYNTAX_OTHER, M4_SYNTAX_NUM
    M4_SYNTAX_NUM    }  and M4_SYNTAX_DOLLAR
@@ -93,9 +92,11 @@
    string is parsed equally whether there is a $ or not.  The character
    $ is used by convention in user macros.  */
 
-static bool check_is_macro_escaped (m4_syntax_table *syntax);
-static int add_syntax_attribute           (m4_syntax_table *syntax, int ch, 
int code);
-static int remove_syntax_attribute (m4_syntax_table *syntax, int ch, int code);
+static bool    check_is_single_quotes          (m4_syntax_table *);
+static bool    check_is_single_comments        (m4_syntax_table *);
+static bool    check_is_macro_escaped          (m4_syntax_table *);
+static int     add_syntax_attribute            (m4_syntax_table *, int, int);
+static int     remove_syntax_attribute         (m4_syntax_table *, int, int);
 
 m4_syntax_table *
 m4_syntax_create (void)
@@ -103,52 +104,49 @@
   m4_syntax_table *syntax = xzalloc (sizeof *syntax);
   int ch;
 
-  for (ch = 256; --ch > 0;)
-    {
-      if (ch == '(')
-       add_syntax_attribute (syntax, ch, M4_SYNTAX_OPEN);
-      else if (ch == ')')
-       add_syntax_attribute (syntax, ch, M4_SYNTAX_CLOSE);
-      else if (ch == ',')
-       add_syntax_attribute (syntax, ch, M4_SYNTAX_COMMA);
-      else if (ch == '$')
-       add_syntax_attribute (syntax, ch, M4_SYNTAX_DOLLAR);
-      else if (ch == '=')
-       add_syntax_attribute (syntax, ch, M4_SYNTAX_ASSIGN);
-      else if (isspace (ch))
-       add_syntax_attribute (syntax, ch, M4_SYNTAX_SPACE);
-      else if (isalpha (ch) || ch == '_')
-       add_syntax_attribute (syntax, ch, M4_SYNTAX_ALPHA);
-      else if (isdigit (ch))
-       add_syntax_attribute (syntax, ch, M4_SYNTAX_NUM);
-      else
-       add_syntax_attribute (syntax, ch, M4_SYNTAX_OTHER);
-    }
-  /* add_syntax_attribute(syntax, 0, M4_SYNTAX_IGNORE); */
-
-  /* Default quotes and comment delimiters are always one char */
-  syntax->lquote.string                = xstrdup (DEF_LQUOTE);
-  syntax->lquote.length                = strlen (syntax->lquote.string);
-  syntax->rquote.string                = xstrdup (DEF_RQUOTE);
-  syntax->rquote.length                = strlen (syntax->rquote.string);
-  syntax->bcomm.string         = xstrdup (DEF_BCOMM);
-  syntax->bcomm.length         = strlen (syntax->bcomm.string);
-  syntax->ecomm.string         = xstrdup (DEF_ECOMM);
-  syntax->ecomm.length         = strlen (syntax->ecomm.string);
-
-  syntax->is_single_quotes     = true;
-  syntax->is_single_comments   = true;
-  syntax->is_macro_escaped     = false;
-
-  add_syntax_attribute (syntax, to_uchar (syntax->lquote.string[0]),
-                       M4_SYNTAX_LQUOTE);
-  add_syntax_attribute (syntax, to_uchar (syntax->rquote.string[0]),
-                       M4_SYNTAX_RQUOTE);
-  add_syntax_attribute (syntax, to_uchar (syntax->bcomm.string[0]),
-                       M4_SYNTAX_BCOMM);
-  add_syntax_attribute (syntax, to_uchar (syntax->ecomm.string[0]),
-                       M4_SYNTAX_ECOMM);
+  /* Set up default table.  This table never changes during operation.  */
+  for (ch = 256; --ch >= 0;)
+    switch (ch)
+      {
+      case '(':
+       syntax->orig[ch] = M4_SYNTAX_OPEN;
+       break;
+      case ')':
+       syntax->orig[ch] = M4_SYNTAX_CLOSE;
+       break;
+      case ',':
+       syntax->orig[ch] = M4_SYNTAX_COMMA;
+       break;
+      case '$':
+       syntax->orig[ch] = M4_SYNTAX_DOLLAR;
+       break;
+      case '`':
+       syntax->orig[ch] = M4_SYNTAX_LQUOTE;
+       break;
+      case '#':
+       syntax->orig[ch] = M4_SYNTAX_BCOMM;
+       break;
+      case '=':
+       /* FIXME -revisit the assign syntax attribute.  */
+       /* syntax->orig[ch] = M4_SYNTAX_ASSIGN; */
+       /* break; */
+      case '\0':
+       /* FIXME - revisit the ignore syntax attribute.  */
+       /* syntax->orig[ch] = M4_SYNTAX_IGNORE; */
+       /* break; */
+      default:
+       if (isspace (ch))
+         syntax->orig[ch] = M4_SYNTAX_SPACE;
+       else if (isalpha (ch) || ch == '_')
+         syntax->orig[ch] = M4_SYNTAX_ALPHA;
+       else if (isdigit (ch))
+         syntax->orig[ch] = M4_SYNTAX_NUM;
+       else
+         syntax->orig[ch] = M4_SYNTAX_OTHER;
+      }
 
+  /* Set up current table to match default.  */
+  m4_set_syntax (syntax, '\0', '\0', NULL);
   return syntax;
 }
 
@@ -171,6 +169,7 @@
 
   switch (ch)
     {
+    /* FIXME - revisit the ignore syntax attribute.  */
     case 'I': case 'i': code = M4_SYNTAX_IGNORE; break;
     case 'O': case 'o': code = M4_SYNTAX_OTHER;  break;
     case 'S': case 's': code = M4_SYNTAX_SPACE;  break;
@@ -180,6 +179,7 @@
     case '(': code = M4_SYNTAX_OPEN;   break;
     case ')': code = M4_SYNTAX_CLOSE;  break;
     case ',': code = M4_SYNTAX_COMMA;  break;
+    /* FIXME - revisit the assign syntax attribute.  */
     case '=': code = M4_SYNTAX_ASSIGN; break;
     case '@': code = M4_SYNTAX_ESCAPE; break;
     case '$': code = M4_SYNTAX_DOLLAR; break;
@@ -205,7 +205,7 @@
   if (code & M4_SYNTAX_MASKS)
     syntax->table[ch] |= code;
   else
-    syntax->table[ch] = code;
+    syntax->table[ch] = (syntax->table[ch] & M4_SYNTAX_MASKS) | code;
 
 #ifdef DEBUG_SYNTAX
   fprintf(stderr, "Set syntax %o %c = %04X\n",
@@ -219,8 +219,8 @@
 static int
 remove_syntax_attribute (m4_syntax_table *syntax, int ch, int code)
 {
-  if (code & M4_SYNTAX_MASKS)
-    syntax->table[ch] &= ~code;
+  assert (code & M4_SYNTAX_MASKS);
+  syntax->table[ch] &= ~code;
 
 #ifdef DEBUG_SYNTAX
   fprintf(stderr, "Unset syntax %o %c = %04X\n",
@@ -231,31 +231,291 @@
   return syntax->table[ch];
 }
 
+static void
+add_syntax_set (m4_syntax_table *syntax, const char *chars, int code)
+{
+  int ch;
+
+  if (*chars == '\0')
+    return;
+
+  if (code == M4_SYNTAX_ESCAPE)
+    syntax->is_macro_escaped = true;
+
+  /* Adding doesn't affect single-quote or single-comment.  */
+
+  while ((ch = to_uchar (*chars++)))
+    add_syntax_attribute (syntax, ch, code);
+}
+
+static void
+subtract_syntax_set (m4_syntax_table *syntax, const char *chars, int code)
+{
+  int ch;
+
+  if (*chars == '\0')
+    return;
+
+  while ((ch = to_uchar (*chars++)))
+    {
+      if ((code & M4_SYNTAX_MASKS) != 0)
+       remove_syntax_attribute (syntax, ch, code);
+      else if (m4_has_syntax (syntax, ch, code))
+       add_syntax_attribute (syntax, ch, M4_SYNTAX_OTHER);
+    }
+
+  /* Check for any cleanup needed.  */
+  switch (code)
+    {
+    case M4_SYNTAX_ESCAPE:
+      if (syntax->is_macro_escaped)
+       check_is_macro_escaped (syntax);
+      break;
+
+    case M4_SYNTAX_LQUOTE:
+    case M4_SYNTAX_RQUOTE:
+      if (syntax->is_single_quotes)
+       check_is_single_quotes (syntax);
+      break;
+
+    case M4_SYNTAX_BCOMM:
+    case M4_SYNTAX_ECOMM:
+      if (syntax->is_single_comments)
+       check_is_single_comments (syntax);
+      break;
+
+    default:
+      break;
+    }
+}
+
+static void
+set_syntax_set (m4_syntax_table *syntax, const char *chars, int code)
+{
+  int ch;
+  /* Explicit set of characters to install with this category; all
+     other characters that used to have the category get reset to
+     OTHER.  */
+  for (ch = 256; --ch >= 0; )
+    {
+      if (code == M4_SYNTAX_RQUOTE || code == M4_SYNTAX_ECOMM)
+       remove_syntax_attribute (syntax, ch, code);
+      else if (m4_has_syntax (syntax, ch, code))
+       add_syntax_attribute (syntax, ch, M4_SYNTAX_OTHER);
+    }
+  while ((ch = to_uchar (*chars++)))
+    add_syntax_attribute (syntax, ch, code);
+
+  /* Check for any cleanup needed.  */
+  check_is_macro_escaped (syntax);
+  check_is_single_quotes (syntax);
+  check_is_single_comments (syntax);
+}
+
+static void
+reset_syntax_set (m4_syntax_table *syntax, int code)
+{
+  int ch;
+  for (ch = 256; --ch >= 0; )
+    {
+      /* Reset the category back to its default state.  All other
+        characters that used to have this category get reset to
+        their default state as well.  */
+      if (code == M4_SYNTAX_RQUOTE)
+       {
+         if (ch == '\'')
+           add_syntax_attribute (syntax, ch, code);
+         else
+           remove_syntax_attribute (syntax, ch, code);
+       }
+      else if (code == M4_SYNTAX_ECOMM)
+       {
+         if (ch == '\n')
+           add_syntax_attribute (syntax, ch, code);
+         else
+           remove_syntax_attribute (syntax, ch, code);
+       }
+      else if (syntax->orig[ch] == code || m4_has_syntax (syntax, ch, code))
+       add_syntax_attribute (syntax, ch, syntax->orig[ch]);
+    }
+  check_is_macro_escaped (syntax);
+  check_is_single_quotes (syntax);
+  check_is_single_comments (syntax);
+}
+
 int
-m4_set_syntax (m4_syntax_table *syntax, const char key, const char *chars)
+m4_set_syntax (m4_syntax_table *syntax, char key, char action,
+              const char *chars)
 {
-  int ch, code;
+  int code;
 
   assert (syntax);
+  assert (chars || key == '\0');
 
-  code = m4_syntax_code (key);
+  if (key == '\0')
+    {
+      /* Restore the default syntax, which has known quote and comment
+        properties.  */
+      memcpy (syntax->table, syntax->orig, sizeof syntax->orig);
+
+      free (syntax->lquote.string);
+      free (syntax->rquote.string);
+      free (syntax->bcomm.string);
+      free (syntax->ecomm.string);
+
+      syntax->lquote.string    = xstrdup (DEF_LQUOTE);
+      syntax->lquote.length    = strlen (syntax->lquote.string);
+      syntax->rquote.string    = xstrdup (DEF_RQUOTE);
+      syntax->rquote.length    = strlen (syntax->rquote.string);
+      syntax->bcomm.string     = xstrdup (DEF_BCOMM);
+      syntax->bcomm.length     = strlen (syntax->bcomm.string);
+      syntax->ecomm.string     = xstrdup (DEF_ECOMM);
+      syntax->ecomm.length     = strlen (syntax->ecomm.string);
 
-  if ((code < 0) && (key != '\0'))
+      add_syntax_attribute (syntax, to_uchar (syntax->rquote.string[0]),
+                           M4_SYNTAX_RQUOTE);
+      add_syntax_attribute (syntax, to_uchar (syntax->ecomm.string[0]),
+                           M4_SYNTAX_ECOMM);
+
+      syntax->is_single_quotes         = true;
+      syntax->is_single_comments       = true;
+      syntax->is_macro_escaped         = false;
+      return 0;
+    }
+
+  code = m4_syntax_code (key);
+  if (code < 0)
     {
       return -1;
     }
+  switch (action)
+    {
+    case '+':
+      add_syntax_set (syntax, chars, code);
+      break;
+    case '-':
+      subtract_syntax_set (syntax, chars, code);
+      break;
+    case '=':
+      set_syntax_set (syntax, chars, code);
+      break;
+    case '\0':
+      reset_syntax_set (syntax, code);
+      break;
+    default:
+      assert (false);
+    }
+  return code;
+}
 
-  if (*chars != '\0')
-    while ((ch = to_uchar (*chars++)))
-      add_syntax_attribute (syntax, ch, code);
-  else
-    for (ch = 256; --ch > 0; )
-      add_syntax_attribute (syntax, ch, code);
+static bool
+check_is_single_quotes (m4_syntax_table *syntax)
+{
+  int ch;
+  int lquote = -1;
+  int rquote = -1;
 
-  if (syntax->is_macro_escaped || code == M4_SYNTAX_ESCAPE)
-    check_is_macro_escaped (syntax);
+  if (! syntax->is_single_quotes)
+    return false;
+  assert (syntax->lquote.length == 1 && syntax->rquote.length == 1);
+
+  if (m4_has_syntax (syntax, to_uchar (*syntax->lquote.string),
+                    M4_SYNTAX_LQUOTE)
+      && m4_has_syntax (syntax, to_uchar (*syntax->rquote.string),
+                       M4_SYNTAX_RQUOTE))
+    return true;
+
+  /* The most recent action invalidated our current lquote/rquote.  If
+     we still have exactly one character performing those roles based
+     on the syntax table, then update lquote/rquote accordingly.
+     Otherwise, keep lquote/rquote, but we no longer have single
+     quotes.  */
+  for (ch = 256; --ch >= 0; )
+    {
+      if (m4_has_syntax (syntax, ch, M4_SYNTAX_LQUOTE))
+       {
+         if (lquote == -1)
+           lquote = ch;
+         else
+           {
+             syntax->is_single_quotes = false;
+             break;
+           }
+       }
+      if (m4_has_syntax (syntax, ch, M4_SYNTAX_RQUOTE))
+       {
+         if (rquote == -1)
+           rquote = ch;
+         else
+           {
+             syntax->is_single_quotes = false;
+             break;
+           }
+       }
+    }
+  if (lquote == -1 || rquote == -1)
+    syntax->is_single_quotes = false;
+  else if (syntax->is_single_quotes)
+    {
+      *syntax->lquote.string = lquote;
+      *syntax->rquote.string = rquote;
+    }
+  return syntax->is_single_quotes;
+}
 
-  return code;
+static bool
+check_is_single_comments (m4_syntax_table *syntax)
+{
+  int ch;
+  int bcomm = -1;
+  int ecomm = -1;
+
+  if (! syntax->is_single_comments)
+    return false;
+  assert (syntax->bcomm.length == 1 && syntax->ecomm.length == 1);
+
+  if (m4_has_syntax (syntax, to_uchar (*syntax->bcomm.string),
+                    M4_SYNTAX_BCOMM)
+      && m4_has_syntax (syntax, to_uchar (*syntax->ecomm.string),
+                       M4_SYNTAX_ECOMM))
+    return true;
+
+  /* The most recent action invalidated our current bcomm/ecomm.  If
+     we still have exactly one character performing those roles based
+     on the syntax table, then update bcomm/ecomm accordingly.
+     Otherwise, keep bcomm/ecomm, but we no longer have single
+     comments.  */
+  for (ch = 256; --ch >= 0; )
+    {
+      if (m4_has_syntax (syntax, ch, M4_SYNTAX_BCOMM))
+       {
+         if (bcomm == -1)
+           bcomm = ch;
+         else
+           {
+             syntax->is_single_comments = false;
+             break;
+           }
+       }
+      if (m4_has_syntax (syntax, ch, M4_SYNTAX_ECOMM))
+       {
+         if (ecomm == -1)
+           ecomm = ch;
+         else
+           {
+             syntax->is_single_comments = false;
+             break;
+           }
+       }
+    }
+  if (bcomm == -1 || ecomm == -1)
+    syntax->is_single_comments = false;
+  else if (syntax->is_single_comments)
+    {
+      *syntax->bcomm.string = bcomm;
+      *syntax->ecomm.string = ecomm;
+    }
+  return syntax->is_single_comments;
 }
 
 static bool
@@ -277,7 +537,7 @@
 
 
 /* Functions for setting quotes and comment delimiters.  Used by
-   m4_changecom () and m4_changequote ().  Both functions overrides the
+   m4_changecom () and m4_changequote ().  Both functions override the
    syntax table to maintain compatibility.  */
 void
 m4_set_quotes (m4_syntax_table *syntax, const char *lq, const char *rq)
@@ -286,20 +546,48 @@
 
   assert (syntax);
 
-  for (ch = 256; --ch >= 0;)   /* changequote overrides syntax_table */
-    if (m4_has_syntax (syntax, ch, M4_SYNTAX_LQUOTE|M4_SYNTAX_RQUOTE))
-      remove_syntax_attribute (syntax, ch, M4_SYNTAX_LQUOTE|M4_SYNTAX_RQUOTE);
-
   free (syntax->lquote.string);
   free (syntax->rquote.string);
 
-  syntax->lquote.string = xstrdup (lq ? lq : DEF_LQUOTE);
+  /* POSIX states that with 0 arguments, the default quotes are used.
+     POSIX XCU ERN 112 states that behavior is implementation-defined
+     if there was only one argument, or if there is an empty string in
+     either position when there are two arguments.  We allow an empty
+     left quote to disable quoting, but a non-empty left quote will
+     always create a non-empty right quote.  See the texinfo for what
+     some other implementations do.  */
+  if (!lq)
+    {
+      lq = DEF_LQUOTE;
+      rq = DEF_RQUOTE;
+    }
+  else if (!rq || (*lq && !*rq))
+    rq = DEF_RQUOTE;
+
+  syntax->lquote.string = xstrdup (lq);
   syntax->lquote.length = strlen (syntax->lquote.string);
-  syntax->rquote.string = xstrdup (rq ? rq : DEF_RQUOTE);
+  syntax->rquote.string = xstrdup (rq);
   syntax->rquote.length = strlen (syntax->rquote.string);
 
-  syntax->is_single_quotes = (syntax->lquote.length == 1
-                             && syntax->rquote.length == 1);
+  /* changequote overrides syntax_table, but be careful when it is
+     used to select a start-quote sequence that is effectively
+     disabled.  */
+
+  syntax->is_single_quotes
+    = (syntax->lquote.length == 1 && syntax->rquote.length == 1
+       && !m4_has_syntax (syntax, to_uchar (*syntax->lquote.string),
+                         (M4_SYNTAX_IGNORE | M4_SYNTAX_ESCAPE
+                          | M4_SYNTAX_ALPHA | M4_SYNTAX_NUM)));
+
+  for (ch = 256; --ch >= 0;)
+    {
+      if (m4_has_syntax (syntax, ch, M4_SYNTAX_LQUOTE))
+       add_syntax_attribute (syntax, ch,
+                             (syntax->orig[ch] == M4_SYNTAX_LQUOTE
+                              ? M4_SYNTAX_OTHER : syntax->orig[ch]));
+      if (m4_has_syntax (syntax, ch, M4_SYNTAX_RQUOTE))
+       remove_syntax_attribute (syntax, ch, M4_SYNTAX_RQUOTE);
+    }
 
   if (syntax->is_single_quotes)
     {
@@ -320,21 +608,46 @@
 
   assert (syntax);
 
-  for (ch = 256; --ch >= 0;)   /* changecom overrides syntax_table */
-    if (m4_has_syntax (syntax, ch, M4_SYNTAX_BCOMM|M4_SYNTAX_ECOMM))
-      remove_syntax_attribute (syntax, ch, M4_SYNTAX_BCOMM|M4_SYNTAX_ECOMM);
-
   free (syntax->bcomm.string);
   free (syntax->ecomm.string);
 
-  syntax->bcomm.string = xstrdup (bc ? bc : DEF_BCOMM);
+  /* POSIX requires no arguments to disable comments.  It requires
+     empty arguments to be used as-is, but this is counter to
+     traditional behavior, because a non-null begin and null end makes
+     it impossible to end a comment.  An aardvark has been filed:
+     http://www.opengroup.org/austin/mailarchives/ag-review/msg02168.html
+     This implementation assumes the aardvark will be approved.  See
+     the texinfo for what some other implementations do.  */
+  if (!bc)
+    bc = ec = "";
+  else if (!ec || (*bc && !*ec))
+    ec = DEF_ECOMM;
+
+  syntax->bcomm.string = xstrdup (bc);
   syntax->bcomm.length = strlen (syntax->bcomm.string);
-  syntax->ecomm.string = xstrdup (ec ? ec : DEF_ECOMM);
+  syntax->ecomm.string = xstrdup (ec);
   syntax->ecomm.length = strlen (syntax->ecomm.string);
 
-  syntax->is_single_comments = (syntax->bcomm.length == 1
-                               && syntax->ecomm.length == 1);
+  /* changecom overrides syntax_table, but be careful when it is used
+     to select a start-comment sequence that is effectively
+     disabled.  */
+
+  syntax->is_single_comments
+    = (syntax->bcomm.length == 1 && syntax->ecomm.length == 1
+       && !m4_has_syntax (syntax, to_uchar (*syntax->bcomm.string),
+                         (M4_SYNTAX_IGNORE | M4_SYNTAX_ESCAPE
+                          | M4_SYNTAX_ALPHA | M4_SYNTAX_NUM
+                          | M4_SYNTAX_LQUOTE)));
 
+  for (ch = 256; --ch >= 0;)
+    {
+      if (m4_has_syntax (syntax, ch, M4_SYNTAX_BCOMM))
+       add_syntax_attribute (syntax, ch,
+                             (syntax->orig[ch] == M4_SYNTAX_BCOMM
+                              ? M4_SYNTAX_OTHER : syntax->orig[ch]));
+      if (m4_has_syntax (syntax, ch, M4_SYNTAX_ECOMM))
+       remove_syntax_attribute (syntax, ch, M4_SYNTAX_ECOMM);
+    }
   if (syntax->is_single_comments)
     {
       add_syntax_attribute (syntax, to_uchar (syntax->bcomm.string[0]),
Index: modules/gnu.c
===================================================================
RCS file: /sources/m4/m4/modules/gnu.c,v
retrieving revision 1.69
diff -u -r1.69 gnu.c
--- modules/gnu.c       19 Dec 2006 17:23:46 -0000      1.69
+++ modules/gnu.c       22 Dec 2006 23:54:59 -0000
@@ -399,7 +399,7 @@
   int resyntax = m4_regexp_syntax_encode (spec);
 
   if (resyntax < 0)
-    m4_error (context, 0, 0, _("%s: bad syntax-spec: `%s'"), caller, spec);
+    m4_warn (context, 0, _("%s: bad syntax-spec: `%s'"), caller, spec);
 
   return resyntax;
 }
@@ -434,14 +434,26 @@
       int i;
       for (i = 1; i < argc; i++)
        {
-         char key = *M4ARG (i);
-         if (key != '\0'
-             && (m4_set_syntax (M4SYNTAX, key,
-                                m4_expand_ranges (M4ARG (i) + 1, obs)) < 0))
+         const char *spec = M4ARG (i);
+         char key = *spec++;
+         char action = key ? *spec : '\0';
+         switch (action)
            {
-             m4_error (context, 0, 0, _("%s: undefined syntax code: `%c'"),
-                       M4ARG (0), key);
+           case '-':
+           case '+':
+           case '=':
+             spec++;
+             break;
+           case '\0':
+             break;
+           default:
+             action = '=';
+             break;
            }
+         if (m4_set_syntax (M4SYNTAX, key, action,
+                            key ? m4_expand_ranges (spec, obs) : "") < 0)
+           m4_warn (context, 0, _("%s: undefined syntax code: `%c'"),
+                    M4ARG (0), key);
        }
     }
   else
Index: modules/m4.c
===================================================================
RCS file: /sources/m4/m4/modules/m4.c,v
retrieving revision 1.96
diff -u -r1.96 m4.c
--- modules/m4.c        19 Dec 2006 17:23:46 -0000      1.96
+++ modules/m4.c        22 Dec 2006 23:54:59 -0000
@@ -665,7 +665,8 @@
   m4_dump_args (context, obs, argc - 1, argv + 1, ",", true);
 }
 
-/* Change the current quotes.  The function set_quotes () lives in input.c.  */
+/* Change the current quotes.  The function set_quotes () lives in
+   syntax.c.  */
 M4BUILTIN_HANDLER (changequote)
 {
   m4_set_quotes (M4SYNTAX,
@@ -674,13 +675,12 @@
 }
 
 /* Change the current comment delimiters.  The function set_comment ()
-   lives in input.c.  */
+   lives in syntax.c.  */
 M4BUILTIN_HANDLER (changecom)
 {
-  if (argc == 1)
-    m4_set_comment (M4SYNTAX, "", ""); /* disable comments */
-  else
-    m4_set_comment (M4SYNTAX, M4ARG (1), (argc >= 3) ? M4ARG (2) : NULL);
+  m4_set_comment (M4SYNTAX,
+                 (argc >= 2) ? M4ARG (1) : NULL,
+                 (argc >= 3) ? M4ARG (2) : NULL);
 }
 
 
Index: src/freeze.c
===================================================================
RCS file: /sources/m4/m4/src/freeze.c,v
retrieving revision 1.56
diff -u -r1.56 freeze.c
--- src/freeze.c        11 Nov 2006 16:21:25 -0000      1.56
+++ src/freeze.c        22 Dec 2006 23:54:59 -0000
@@ -632,7 +632,7 @@
            }
          string[0][number[0]] = '\0';
 
-         if ((m4_set_syntax (context->syntax, syntax, string[0]) < 0)
+         if ((m4_set_syntax (context->syntax, syntax, '=', string[0]) < 0)
              && (syntax != '\0'))
            {
              m4_error (context, 0, 0,






reply via email to

[Prev in Thread] Current Thread [Next in Thread]