Re: speed up input parsing

m4-patches
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: speed up input parsing

From:	Eric Blake
Subject:	Re: speed up input parsing
Date:	Mon, 16 Feb 2009 18:14:01 +0000 (UTC)
User-agent:	Loom/3.14 (http://gmane.org/)
Eric Blake <ebb9 <at> byu.net> writes:

> Master is still about 50% slower than branch-1.6 because of
> its heavy use of indirect function calls, although I'm hoping that porting
> argv_ref patch 29 will improve the situation.

Indeed, I'm seeing an improvement of more than 20% in execution speed with my 
preliminary version of the argv_ref patch 29 ported to master (11.0s down to 
8.5s for autoconf on coreutils).  But there are some preliminary steps needed 
to make this possible.  These three patches clean up changesyntax with respects 
to changequote/changecom.  Simultaneously supporting a multi-character quote 
delimiter from changequote and a second single-character quote delimiter from 
changesyntax, where the multi-byte quote start only matched a multi-byte end, 
but where any single-byte start matches any other single-byte end, was just too 
confusing, and had too much code duplication.  This series tries to clean 
things up, adding tests along the way, so that changesyntax and changequote 
have a more predictable effect on one another, with a net reduction in lines of 
code.  Although this is yet another semantic change to changesyntax, we've 
already documented that changesyntax is not yet set in stone.

>From 1e2cb352077020f928c9e6c700880276ea79d729 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Sat, 14 Feb 2009 06:58:08 -0700
Subject: [PATCH 1/3] Improve changesyntax documentation.

* doc/m4.texinfo (Changesyntax): Merge two tables into one
multitable.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog      |    4 +
 doc/m4.texinfo |  261 +++++++++++++++++++++++++++-----------------------------
 2 files changed, 131 insertions(+), 134 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 90957fd..796c720 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,9 @@
 2009-02-16  Eric Blake  <address@hidden>

+       Improve changesyntax documentation.
+       * doc/m4.texinfo (Changesyntax): Merge two tables into one
+       multitable.
+
        Fix regression in multicharacter quotes, from 2008-01-26.
        * m4/input.c (m4__next_token): Fix typo.
        * tests/builtins.at (changequote): Enhance test.
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index e574bd5..3d20d74 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -5401,71 +5401,125 @@ Changesyntax
 name starts with a letter or @samp{_} and consists of the longest
 possible string of letters, @samp{_} and digits.  But who is to decide
 what characters are letters, digits, quotes, white space?  Earlier the
-operating system decided, now you do.
+operating system decided, now you do.  The builtin macro
address@hidden is used to change the way @code{m4} parses the input
+stream into tokens.

-Input characters belong to different categories:
address@hidden {Builtin (gnu)} changesyntax (@var{syntax-spec}, @dots{})
+Each @var{syntax-spec} is a two-part string.  The first part is a
+command, consisting of a single character describing a syntax category,
+and an optional one-character action.  The action can be @samp{-} to
+remove the listed characters from that category and reassign them to the
+`Other' category, @samp{=} to set the category to the listed characters
+and reassign all other characters previously in that category to
+`Other', or @samp{+} to add the listed characters to the category
+without affecting other characters.  If an action is not specified, but
+additional characters are present, then @samp{=} is assumed.

address@hidden @dfn
address@hidden Letters
-Characters that start a macro name.  Defaults to the letters as defined
-by the locale, and the character @samp{_}.
+The remaining characters of each @var{syntax-spec} form the set of
+characters to perform the action on for that syntax category.  Character
+ranges are expanded as for @code{translit} (@pxref{Translit}).  To start
+the character set with @samp{-}, @samp{+}, or @samp{=}, an action must
+be specified.
+
+If @var{syntax-spec} is just a category, and no action or characters
+were specified, then all characters in that category are reset to their
+default state.  A warning is issued if the category character is not
+valid.  If @var{syntax-spec} is the empty string, then all categories
+are reset to their default state.
+
+Syntax categories are divided into basic and context.  Every input
+byte belongs to exactly one basic syntax category.  Additionally, any
+byte can be assigned to a context category regardless of its current
+basic category.  Context categories exist because a character can
+behave differently when parsed in isolation than when it occurs in
+context to close out a token started by another basic category (for
+example, @kbd{newline} defaults to the basic category `Whitespace' as
+well as the context category `End comment').
+
+The following table describes the case-insensitive designation for each
+syntax category (the first byte in @var{syntax-spec}), and a description
+of what each category controls.
+
address@hidden @columnfractions .06 .20 .13 .55
address@hidden Code @tab Category @tab Type @tab Description

address@hidden Digits
-Characters that, together with the letters, form the remainder of a
address@hidden @kbd{W} @tab @dfn{Words} @tab Basic
address@hidden Characters that can start a macro name.  Defaults to the letters 
as
+defined by the locale, and the character @samp{_}.
+
address@hidden @kbd{D} @tab @dfn{Digits} @tab Basic
address@hidden Characters that, together with the letters, form the remainder 
of a
 macro name.  Defaults to the ten digits @address@hidden@samp{9}, and any
 other digits defined by the locale.

address@hidden White space
-Characters that should be trimmed from the beginning of each argument to
address@hidden @kbd{S} @tab @dfn{White space} @tab Basic
address@hidden Characters that should be trimmed from the beginning of each 
argument to
 a macro call.  The defaults are space, tab, newline, carriage return,
 form feed, and vertical tab, and any others as defined by the locale.

address@hidden Open parenthesis
-Characters that open the argument list of a macro call.  The default is
address@hidden @kbd{(} @tab @dfn{Open parenthesis} @tab Basic
address@hidden Characters that open the argument list of a macro call.  The 
default is
 the single character @samp{(}.

address@hidden Close parenthesis
-Characters that close the argument list of a macro call.  The default
address@hidden @kbd{)} @tab @dfn{Close parenthesis} @tab Basic
address@hidden Characters that close the argument list of a macro call.  The 
default
 is the single character @samp{)}.

address@hidden Argument separator
-Characters that separate the arguments of a macro call.  The default is
address@hidden @kbd{,} @tab @dfn{Argument separator} @tab Basic
address@hidden Characters that separate the arguments of a macro call.  The 
default is
 the single character @samp{,}.

address@hidden Dollar
-Characters that can introduce an argument reference in the body of a
address@hidden @kbd{L} @tab @dfn{Left quote} @tab Basic
address@hidden The set of characters that can start a single-character quoted 
string.
+The default is the single character @samp{`}.  For multiple-character
+quote delimiters, use @code{changequote} (@pxref{Changequote}).
+
address@hidden @kbd{R} @tab @dfn{Right quote} @tab Context
address@hidden The set of characters that can end a single-character quoted 
string.
+The default is the single character @samp{'}.  For multiple-character
+quote delimiters, use @code{changequote} (@pxref{Changequote}).  Note
+that @samp{'} also defaults to the syntax category `Other', when it
+appears in isolation.
+
address@hidden @kbd{B} @tab @dfn{Begin comment} @tab Basic
address@hidden The set of characters that can start a single-character comment. 
 The
+default is the single character @samp{#}.  For multiple-character
+comment delimiters, use @code{changecom} (@pxref{Changecom}).
+
address@hidden @kbd{E} @tab @dfn{End comment} @tab Context
address@hidden The set of characters that can end a single-character comment.  
The
+default is the single character @kbd{newline}.  For multiple-character
+comment delimiters, use @code{changecom} (@pxref{Changecom}).  Note that
+newline also defaults to the syntax category `White space', when it
+appears in isolation.
+
address@hidden FIXME - make ${} context, not basic
address@hidden @kbd{$} @tab @dfn{Dollar} @tab Basic
address@hidden Characters that can introduce an argument reference in the body 
of a
 macro.  The default is the single character @samp{$}.

address@hidden Left brace
-Characters that introduce an extended argument reference in the body of
address@hidden FIXME - implement ${10} argument parsing.
address@hidden @address@hidden @tab @dfn{Left brace} @tab Basic
address@hidden Characters that introduce an extended argument reference in the 
body of
 a macro immediately after a character in the Dollar category.  The
 default is the single character @address@hidden

address@hidden Right brace
-Characters that conclude an extended argument reference in the body of a
address@hidden @address@hidden @tab @dfn{Right brace} @tab Basic
address@hidden Characters that conclude an extended argument reference in the 
body of a
 macro.  The default is the single character @address@hidden

address@hidden Left quote
-The set of characters that can start a single-character quoted string.
-The default is the single character @samp{`}.  For multiple-character
-quote delimiters, use @code{changequote} (@pxref{Changequote}).
-
address@hidden Begin comment
-The set of characters that can start a single-character comment.  The
-default is the single character @samp{#}.  For multiple-character
-comment delimiters, use @code{changecom} (@pxref{Changecom}).
-
address@hidden Other
-Characters that have no special syntactical meaning to @code{m4}.
address@hidden @kbd{O} @tab @dfn{Other} @tab Basic
address@hidden Characters that have no special syntactical meaning to @code{m4}.
 Defaults to all characters except those in the categories above.

address@hidden Active
-Characters that themselves, alone, form macro names.  This is a
address@hidden @kbd{A} @tab @dfn{Active} @tab Basic
address@hidden Characters that themselves, alone, form macro names.  This is a
 @acronym{GNU} extension, and active characters have lower precedence
 than comments.  By default, no characters are active.

address@hidden Escape
-Characters that must precede macro names for them to be recognized.
address@hidden @kbd{@@} @tab @dfn{Escape} @tab Basic
address@hidden Characters that must precede macro names for them to be 
recognized.
 This is a @acronym{GNU} extension.  When an escape character is defined,
 then macros are not recognized unless the escape character is present;
 however, the macro name, visible by @samp{$0} in macro definitions, does
@@ -5473,97 +5527,10 @@ Changesyntax
 escapes.

 @comment FIXME - we should also consider supporting:
address@hidden @item Ignore - characters that are ignored if they appear in
address@hidden the input; perhaps defaulting to '\0', category 'I'.
address@hidden table
-
address@hidden
-Each character can, besides the basic syntax category, have some syntax
-attributes.  One reason these are attributes rather than categories is
-that end delimiters are never recognized except when searching for the
-end of a token triggered by a start delimiter; the end delimiter can
-have syntax properties of its own when it appears in isolation.  These
-attributes are:
-
address@hidden @dfn
address@hidden Right quote
-The set of characters that can end a single-character quoted string.
-The default is the single character @samp{'}.  For multiple-character
-quote delimiters, use @code{changequote} (@pxref{Changequote}).  Note
-that @samp{'} also defaults to the syntax category `Other', when it
-appears in isolation.
-
address@hidden End comment
-The set of characters that can end a single-character comment.  The
-default is the single character @kbd{newline}.  For multiple-character
-comment delimiters, use @code{changecom} (@pxref{Changecom}).  Note that
-newline also defaults to the syntax category `White space', when it
-appears in isolation.
address@hidden table
-
-The builtin macro @code{changesyntax} is used to change the way
address@hidden parses the input stream into tokens.
-
address@hidden {Builtin (gnu)} changesyntax (@var{syntax-spec}, @dots{})
-Each @var{syntax-spec} is a two-part string.  The first part is a
-command, consisting of a single character describing a syntax category,
-and an optional one-character action.  The action can be @samp{-} to
-remove the listed characters from that category and reassign them to the
-`Other' category, @samp{=} to set the category to the listed characters
-and reassign all other characters previously in that category to
-`Other', or @samp{+} to add the listed characters to the category
-without affecting other characters.  If an action is not specified, but
-additional characters are present, then @samp{=} is assumed.  The
-case-insensitive characters for the syntax categories are:
-
address@hidden @kbd
address@hidden W
-Letters
address@hidden D
-Digits
address@hidden S
-White space
address@hidden (
-Open parenthesis
address@hidden )
-Close parenthesis
address@hidden ,
-Argument separator
address@hidden $
-Dollar
address@hidden @{
-Left brace
address@hidden @}
-Right brace
address@hidden O
-Other
address@hidden @@
-Escape
address@hidden A
-Active
address@hidden L
-Left quote
address@hidden R
-Right quote
address@hidden B
-Begin comment
address@hidden E
-End comment
address@hidden @item I
address@hidden Ignore
address@hidden table
-
-The remaining characters of each @var{syntax-spec} form the set of
-characters to perform the action on for that syntax category.  Character
-ranges are expanded as for @code{translit} (@pxref{Translit}).  To start
-the character set with @samp{-}, @samp{+}, or @samp{=}, an action must
-be specified.
-
-If @var{syntax-spec} is just a category, and no action or characters
-were specified, then all characters in that category are reset to their
-default state.  A warning is issued if the category character is not
-valid.  If @var{syntax-spec} is the empty string, then all categories
-are reset to their default state.
address@hidden @item @kbd{I} @tab @dfn{Ignore} @tab Basic
address@hidden @tab Characters that are ignored if they appear in
address@hidden the input; perhaps defaulting to '\0'.
address@hidden multitable

 The expansion of @code{changesyntax} is void.
 The macro @code{changesyntax} is recognized only with parameters.  Use
@@ -5572,7 +5539,9 @@ Changesyntax
 This macro was added in M4 2.0.
 @end deffn

-With @code{changesyntax} we can modify what characters form a word.
+With @code{changesyntax} we can modify what characters form a word.  For
+example, we can make @samp{.} a valid character in a macro name, or even
+start a macro name with a number.

 @example
 define(`test.1', `TEST ONE')
@@ -5583,18 +5552,21 @@ Changesyntax
 @result{}stdin
 test.1
 @result{}test.1
+dnl Add `.' and remove `_'.
 changesyntax(`W+.', `W-_')
 @result{}
 __file__
 @result{}__file__
 test.1
 @result{}TEST ONE
+dnl Set words to include numbers.
 changesyntax(`W=a-zA-Z0-9_')
 @result{}
 __file__
 @result{}stdin
 test.1
 @result{}test.one
+dnl Reset words to default (a-zA-Z_).
 changesyntax(`W')
 @result{}
 __file__
@@ -5610,6 +5582,7 @@ Changesyntax
 @result{}
 test(a, b, c)
 @result{}3
+dnl Change macro syntax.
 changesyntax(`(<', `,|', `)>')
 @result{}
 test(a, b, c)
@@ -5627,10 +5600,14 @@ Changesyntax
 @result{}
 test(`a', `b', `c')
 @result{}abc
-changesyntax(`O 'format(`%c', `9'))
+dnl Don't ignore whitespace.
+changesyntax(`O 'format(``%c'', `9')`
+')
 @result{}
-test(a, b, c)
address@hidden b c
+test(a, b,
+c)
address@hidden b
address@hidden
 @end example

 It is possible to redefine the @samp{$} used to indicate macro arguments
@@ -5641,6 +5618,7 @@ Changesyntax
 @result{}
 argref(1, 2, 3)
 @result{}Dollar: 3, Question: ?#
+dnl Change argument identifier.
 changesyntax(`$?', `O$')
 @result{}
 argref(1,2,3)
@@ -5654,6 +5632,7 @@ Changesyntax
 @example
 define(`escape', `$?`'1$?1?')
 @result{}
+dnl Change argument identifier.
 changesyntax(`$?')
 @result{}
 escape(foo)
@@ -5674,6 +5653,7 @@ Changesyntax
 @example
 define(`foo', `bar')
 @result{}
+dnl Require @@ escape before any macro.
 changesyntax(`@@@@')
 @result{}
 foo
@@ -5682,6 +5662,7 @@ Changesyntax
 @result{}bar
 @@bar
 @result{}@@bar
+@@dnl Change escape character.
 @@changesyntax(`@@\', `O@@')
 @result{}
 foo
@@ -5705,14 +5686,24 @@ Changesyntax
 @example
 define(`@@', `TEST')
 @result{}
+define(`a@@a', `hello')
address@hidden
+define(`a', `A')
address@hidden
 @@
 @result{}@@
+a@@a
address@hidden@@A
+dnl Make @@ active.
 changesyntax(`A@@')
 @result{}
 @@
 @result{}TEST
+a@@a
address@hidden
 @end example

address@hidden FIXME - improve this wording
 There is obviously an overlap with @code{changecom} and
 @code{changequote}.  Comment delimiters and quotes can now be defined in
 two different ways.  To avoid incompatibilities, if the quotes are set
@@ -5720,12 +5711,13 @@ Changesyntax
 as quotes will revert to their normal syntax categories, leaving only
 one set of defined quotes as before.  If the quotes are set with
 @code{changesyntax}, it is possible to result in multiple sets of
-quotes.  This applies to comment delimiters as well, @emph{mutatis
+quotes.  This applies to comment delimiters as well, @i{mutatis
 mutandis}.

 @example
 define(`test', `TEST')
 @result{}
+dnl Add additional single-byte delimiters.
 changesyntax(`L+<', `R+>')
 @result{}
 <test>
@@ -5749,6 +5741,7 @@ Changesyntax
 parenthesis will match any close parenthesis, etc.

 @example
+dnl Go crazy with symbols.
 changesyntax(`(@{<', `)@}>', `,;:', `O(,)')
 @result{}
 address@hidden; 2: 8>
-- 
1.6.1.2


>From e5632a42071a39b1e6988533aeb2aeab16188b85 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Sat, 14 Feb 2009 10:14:34 -0700
Subject: [PATCH 2/3] Revamp changesyntax vs. changequote interactions.

* m4/m4module.h (M4_SYNTAX_VALUE): Delete unused macro.
(M4_SYNTAX_SUSPECT): New macro.
* m4/m4private.h (struct m4_syntax_table): Add suspect field.
* m4/syntax.c (check_is_single_quotes, check_is_single_comments)
(check_is_macro_escaped): Delete, by inlining body...
(m4_set_syntax): ...into here.  Improves handling between
changesyntax and changequote/changecom.
(add_syntax_set, subtract_syntax_set, set_syntax_set): Simplify,
and let suspect field track needed cleanup.
(m4_set_quotes, m4_set_comment): Adjust meaning of
is_single_quotes and is_single_comment flags to always be true if
only one delimiter exists, regardless of its length.  Ensure that
the syntax categories M4_SYNTAX_LQUOTE and M4_SYNTAX_BCOMM are
only used on 1-byte delimiters.
(add_syntax_attribute, remove_syntax_attribute): Change signature
to allow the use of fewer casts.  Adjust the suspect field when
necessary.
(m4_reset_syntax, set_quote_age): Adjust callers.
* m4/input.c (m4__next_token, m4__next_token_is_open): Simplify
callers.
* doc/m4.texinfo (Changesyntax): Update documentation and tests.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog      |   23 +++
 doc/m4.texinfo |   88 ++++++------
 m4/input.c     |   23 ++--
 m4/m4module.h  |    6 +-
 m4/m4private.h |   12 +-
 m4/syntax.c    |  438 +++++++++++++++++++++++++++-----------------------------
 6 files changed, 298 insertions(+), 292 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 796c720..8f77619 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,28 @@
 2009-02-16  Eric Blake  <address@hidden>

+       Revamp changesyntax vs. changequote interactions.
+       * m4/m4module.h (M4_SYNTAX_VALUE): Delete unused macro.
+       (M4_SYNTAX_SUSPECT): New macro.
+       * m4/m4private.h (struct m4_syntax_table): Add suspect field.
+       * m4/syntax.c (check_is_single_quotes, check_is_single_comments)
+       (check_is_macro_escaped): Delete, by inlining body...
+       (m4_set_syntax): ...into here.  Improves handling between
+       changesyntax and changequote/changecom.
+       (add_syntax_set, subtract_syntax_set, set_syntax_set): Simplify,
+       and let suspect field track needed cleanup.
+       (m4_set_quotes, m4_set_comment): Adjust meaning of
+       is_single_quotes and is_single_comment flags to always be true if
+       only one delimiter exists, regardless of its length.  Ensure that
+       the syntax categories M4_SYNTAX_LQUOTE and M4_SYNTAX_BCOMM are
+       only used on 1-byte delimiters.
+       (add_syntax_attribute, remove_syntax_attribute): Change signature
+       to allow the use of fewer casts.  Adjust the suspect field when
+       necessary.
+       (m4_reset_syntax, set_quote_age): Adjust callers.
+       * m4/input.c (m4__next_token, m4__next_token_is_open): Simplify
+       callers.
+       * doc/m4.texinfo (Changesyntax): Update documentation and tests.
+
        Improve changesyntax documentation.
        * doc/m4.texinfo (Changesyntax): Merge two tables into one
        multitable.
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 3d20d74..5c09838 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -5703,16 +5703,24 @@ Changesyntax
 @result{}ATESTa
 @end example

address@hidden FIXME - improve this wording
-There is obviously an overlap with @code{changecom} and
address@hidden  Comment delimiters and quotes can now be defined in
-two different ways.  To avoid incompatibilities, if the quotes are set
-with @code{changequote}, all other characters marked in the syntax table
-as quotes will revert to their normal syntax categories, leaving only
-one set of defined quotes as before.  If the quotes are set with
address@hidden, it is possible to result in multiple sets of
-quotes.  This applies to comment delimiters as well, @i{mutatis
-mutandis}.
+There is obviously an overlap between @code{changesyntax} and
address@hidden, since there are now two ways to modify quote
+delimiters.  To avoid incompatibilities, if the quotes are modified by
address@hidden, any characters previously set to either quote
+delimiter by @code{changesyntax} are first demoted to the other category
+(@samp{O}), so the result is only a single set of quotes.  In the other
+direction, if quotes were already disabled, or if both the start and end
+delimiter set by @code{changequote} are single bytes, then
address@hidden preserves those settings.  But if either delimiter
+occupies multiple bytes, @code{changesyntax} first disables both
+delimiters.  Quotes can be disabled via @code{changesyntax} by emptying
+the left quote basic category (@samp{L}).  Meanwhile, the right quote
+context category (@samp{R}) will never be empty; if a
address@hidden action would otherwise leave that category empty,
+then the default end delimiter from @code{changequote} (@samp{'}) is
+used; thus, it is never possible to get @code{m4} in a state where a
+quoted string cannot be terminated.  These interactions apply to comment
+delimiters as well, @i{mutatis mutandis} with @code{changecom}.

 @example
 define(`test', `TEST')
@@ -5720,20 +5728,33 @@ Changesyntax
 dnl Add additional single-byte delimiters.
 changesyntax(`L+<', `R+>')
 @result{}
-<test>
address@hidden
-`test'
address@hidden
-[test]
address@hidden
+<test> `test' [test] <<test>>
address@hidden test [TEST] <test>
+dnl Use standard interface, overriding changesyntax settings.
 changequote(<[>, `]')
 @result{}
-<test>
address@hidden<TEST>
-`test'
address@hidden'
-[test]
address@hidden
+<test> `test' [test] <<test>>
address@hidden<TEST> `TEST' test <<TEST>>
+dnl Introduce multi-byte delimiters.
+changequote([<<], [>>])
address@hidden
+<test> `test' [test] <<test>>
address@hidden<TEST> `TEST' [TEST] test
+dnl Change end quote, effectively disabling quotes.
+changesyntax(<<R]>>)
address@hidden
+<test> `test' [test] <<test>>
address@hidden<TEST> `TEST' [TEST] <<TEST>>
+dnl Change beginning quote, make ] normal, thus making ' end quote.
+changesyntax(L`, R-])
address@hidden
+<test> `test' [test] <<test>>
address@hidden<TEST> test [TEST] <<TEST>>
+dnl Set multi-byte quote; unrelated changes don't impact it.
+changequote(`<<', `>>')changesyntax(<<@@\>>)
address@hidden
+<\test> `\test' [\test] <<\test>>
address@hidden<TEST> `TEST' [TEST] \test
 @end example

 If several characters are assigned to a category that forms single
@@ -5748,29 +5769,6 @@ Changesyntax
 @result{}00001111
 @end example

-On the other hand, a multi-character start-quote sequence, which can
-only be created by @code{changequote}, will only be matched by the
-corresponding end-quote sequence.  The same goes for comment delimiters.
-
address@hidden
-define(`test', `==$1==')
address@hidden
-changequote(`<<', `>>')
address@hidden
-changesyntax(<<L[>>, <<R]>>)
address@hidden
-test(<<testing]>>)
address@hidden
-test([testing>>])
address@hidden>>==
-test([<<testing>>])
address@hidden
address@hidden example
-
address@hidden
-Note how it is possible to have both long and short quotes, if
address@hidden is used before @code{changesyntax}.
-
 The syntax table is initialized to be backwards compatible, so if you
 never call @code{changesyntax}, nothing will have changed.

diff --git a/m4/input.c b/m4/input.c
index ba2e467..6c761d0 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -1640,9 +1640,8 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
              obstack_1grow (obs_safe, ch);
          }
       }
-    else if (!m4_is_syntax_single_quotes (M4SYNTAX)
-            && MATCH (context, ch, context->syntax->quote.str1,
-                      context->syntax->quote.len1, true))
+    else if (MATCH (context, ch, context->syntax->quote.str1,
+                   context->syntax->quote.len1, true))
       {                                        /* QUOTED STRING, LONGER QUOTES 
*/
        if (obs)
          obs_safe = obs;
@@ -1719,9 +1718,8 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
        type = (m4_get_discard_comments_opt (context)
                ? M4_TOKEN_NONE : M4_TOKEN_COMMENT);
       }
-    else if (!m4_is_syntax_single_comments (M4SYNTAX)
-            && MATCH (context, ch, context->syntax->comm.str1,
-                      context->syntax->comm.len1, true))
+    else if (MATCH (context, ch, context->syntax->comm.str1,
+                   context->syntax->comm.len1, true))
       {                                        /* COMMENT, LONGER DELIM */
        if (obs && !m4_get_discard_comments_opt (context))
          obs_safe = obs;
@@ -1779,8 +1777,7 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
        obstack_1grow (&token_stack, ch);
        type = M4_TOKEN_CLOSE;
       }
-    else if (m4_is_syntax_single_quotes (M4SYNTAX)
-            && m4_is_syntax_single_comments (M4SYNTAX))
+    else if (m4__safe_quotes (M4SYNTAX))
       {                        /* EVERYTHING ELSE (SHORT QUOTES AND COMMENTS) 
*/
        assert (ch < CHAR_EOF);
        obstack_1grow (&token_stack, ch);
@@ -1882,12 +1879,10 @@ m4__next_token_is_open (m4 *context)
       || m4_has_syntax (M4SYNTAX, ch, (M4_SYNTAX_BCOMM | M4_SYNTAX_ESCAPE
                                       | M4_SYNTAX_ALPHA | M4_SYNTAX_LQUOTE
                                       | M4_SYNTAX_ACTIVE))
-      || (!m4_is_syntax_single_comments (M4SYNTAX)
-         && MATCH (context, ch, context->syntax->comm.str1,
-                   context->syntax->comm.len1, false))
-      || (!m4_is_syntax_single_quotes (M4SYNTAX)
-         && MATCH (context, ch, context->syntax->quote.str1,
-                   context->syntax->quote.len1, false)))
+      || (MATCH (context, ch, context->syntax->comm.str1,
+                context->syntax->comm.len1, false))
+      || (MATCH (context, ch, context->syntax->quote.str1,
+                context->syntax->quote.len1, false)))
     return false;
   return m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_OPEN);
 }
diff --git a/m4/m4module.h b/m4/m4module.h
index 07f8c1a..c94f56a 100644
--- a/m4/m4module.h
+++ b/m4/m4module.h
@@ -484,8 +484,12 @@ enum {
   M4_SYNTAX_ECOMM              = 1 << 15
 };

+/* Mask of attribute syntax categories.  */
 #define M4_SYNTAX_MASKS                (M4_SYNTAX_RQUOTE | M4_SYNTAX_ECOMM)
-#define M4_SYNTAX_VALUE                (~(M4_SYNTAX_RQUOTE | M4_SYNTAX_ECOMM))
+/* Mask of basic syntax categories where any change requires a
+   recomputation of the overall syntax characteristics.  */
+#define M4_SYNTAX_SUSPECT      (M4_SYNTAX_LQUOTE | M4_SYNTAX_BCOMM     \
+                                | M4_SYNTAX_ESCAPE)

 #define m4_syntab(S, C)                ((S)->table[(C)])
 /* Determine if character C matches any of the bitwise-or'd syntax
diff --git a/m4/m4private.h b/m4/m4private.h
index 49fba3b..4f26979 100644
--- a/m4/m4private.h
+++ b/m4/m4private.h
@@ -1,6 +1,6 @@
 /* GNU m4 -- A simple macro processor
    Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 1998, 1999, 2004,
-   2005, 2006, 2007, 2008 Free Software Foundation, Inc.
+   2005, 2006, 2007, 2008, 2009 Free Software Foundation, Inc.

    This file is part of GNU M4.

@@ -472,17 +472,19 @@ struct m4_syntax_table {
   m4_string_pair quote;        /* Quote delimiters.  */
   m4_string_pair comm; /* Comment delimiters.  */

-  /* True iff strlen(lquote) == strlen(rquote) == 1 and lquote is not
-     interfering with macro names.  */
+  /* True iff only one start and end quote delimiter exist.  */
   bool_bitfield is_single_quotes : 1;

-  /* True iff strlen(bcomm) == strlen(ecomm) == 1 and bcomm is not
-     interfering with macros or quotes.  */
+  /* True iff only one start and end comment delimiter exist.  */
   bool_bitfield is_single_comments : 1;

   /* True iff some character has M4_SYNTAX_ESCAPE.  */
   bool_bitfield is_macro_escaped : 1;

+  /* True iff a changesyntax call has impacted something that requires
+     cleanup at the end.  */
+  bool_bitfield suspect : 1;
+
   /* Track the number of changesyntax calls.  This saturates at
      0xffff, so the idea is that most users won't be changing the
      syntax that frequently; perhaps in the future we will cache
diff --git a/m4/syntax.c b/m4/syntax.c
index 1fb4815..213d790 100644
--- a/m4/syntax.c
+++ b/m4/syntax.c
@@ -1,6 +1,6 @@
 /* GNU m4 -- A simple macro processor
    Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2002, 2004, 2006,
-   2007, 2008 Free Software Foundation, Inc.
+   2007, 2008, 2009 Free Software Foundation, Inc.

    This file is part of GNU M4.

@@ -31,6 +31,7 @@
    according to a syntax table.  The character groups are (definitions
    are all in m4.h, those marked with a * are not yet in use):

+   Basic (all characters fall in one of these mutually exclusive bins)
    M4_SYNTAX_IGNORE    *Character to be deleted from input as if not present
    M4_SYNTAX_OTHER     Any character with no special meaning to m4
    M4_SYNTAX_SPACE     Whitespace (ignored when leading macro arguments)
@@ -46,12 +47,12 @@
    M4_SYNTAX_ALPHA     Alphabetic characters (can start macro names)
    M4_SYNTAX_NUM       Numeric characters (can form macro names)

-   M4_SYNTAX_LQUOTE    A single characters left quote
-   M4_SYNTAX_BCOMM     A single characters begin comment delimiter
+   M4_SYNTAX_LQUOTE    A single character left quote
+   M4_SYNTAX_BCOMM     A single character begin comment delimiter

-   (These are bit masks)
-   M4_SYNTAX_RQUOTE    A single characters right quote
-   M4_SYNTAX_ECOMM     A single characters end comment delimiter
+   Attribute (these are context sensitive, and exist in addition to basic)
+   M4_SYNTAX_RQUOTE    A single character right quote
+   M4_SYNTAX_ECOMM     A single character end comment delimiter

    Besides adding new facilities, the use of a syntax table will reduce
    the number of calls to next_token ().  Now groups of OTHER, NUM and
@@ -65,15 +66,10 @@
    "changesyntax" allows the the user to change the category of any
    character.

-   Default '\n' is both ECOMM and SPACE, depending on the context.  To
-   solve the problem of quotes and comments that have diffent syntax
-   code based on the context, the RQUOTE and ECOMM codes are bit
-   masks to add to an ordinary code.  If a character is made a quote it
-   will be recognised if the basis code does not have precedence.
-
-   When changing quotes and comment delimiters only the bits are
-   removed, and the characters are therefore reverted to its old
-   category code.
+   By default, '\n' is both ECOMM and SPACE, depending on the context.
+   Hence we have basic categories (mutually exclusive, can introduce a
+   context, and can be empty sets), and attribute categories
+   (additive, only recognized in context, and will never be empty).

    The precedence as implemented by next_token () is:

@@ -100,13 +96,27 @@
    a string is parsed equally whether there is a $ or not.  These characters
    are instead used during user macro expansion.

-   M4_SYNTAX_RQUOTE and M4_SYNTAX_ECOMM do not start tokens.  */

-static bool check_is_single_quotes     (m4_syntax_table *);
-static bool check_is_single_comments   (m4_syntax_table *);
-static bool check_is_macro_escaped     (m4_syntax_table *);
-static int add_syntax_attribute                (m4_syntax_table *, int, int);
-static int remove_syntax_attribute     (m4_syntax_table *, int, int);
+   M4_SYNTAX_RQUOTE and M4_SYNTAX_ECOMM do not start tokens.
+
+   There are several optimizations that can be performed depending on
+   known states of the syntax table.  For example, when searching for
+   quotes, if there is only a single start quote and end quote
+   delimiter, we can use memchr2 and search a word at a time, instead
+   of performing a table lookup a byte at a time.  The is_single_*
+   flags track whether quotes and comments have a single delimiter
+   (always the case if changequote/changecom were used, and
+   potentially the case after changesyntax).  Since we frequently need
+   to access quotes, we store the oldest valid quote outside the
+   lookup table; the suspect flag tracks whether a cleanup pass is
+   needed to restore our invariants.  On the other hand, coalescing
+   multiple M4_SYNTAX_OTHER bytes could form a delimiter, so many
+   optimizations must be disabled if a multi-byte delimiter exists;
+   this is handled by m4__safe_quotes.  Meanwhile, quotes and comments
+   can be disabled if the leading delimiter is length 0.  */
+
+static int add_syntax_attribute                (m4_syntax_table *, char, int);
+static int remove_syntax_attribute     (m4_syntax_table *, char, int);
 static void set_quote_age              (m4_syntax_table *, bool, bool);

 m4_syntax_table *
@@ -217,35 +227,44 @@ m4_syntax_code (char ch)
 
 /* Functions to manipulate the syntax table.  */
 static int
-add_syntax_attribute (m4_syntax_table *syntax, int ch, int code)
+add_syntax_attribute (m4_syntax_table *syntax, char ch, int code)
 {
+  int c = to_uchar (ch);
   if (code & M4_SYNTAX_MASKS)
-    syntax->table[ch] |= code;
+    {
+      syntax->table[c] |= code;
+      syntax->suspect = true;
+    }
   else
-    syntax->table[ch] = (syntax->table[ch] & M4_SYNTAX_MASKS) | code;
+    {
+      if ((code & (M4_SYNTAX_SUSPECT)) != 0
+         || m4_has_syntax (syntax, c, M4_SYNTAX_SUSPECT))
+       syntax->suspect = true;
+      syntax->table[c] = ((syntax->table[c] & M4_SYNTAX_MASKS) | code);
+    }

 #ifdef DEBUG_SYNTAX
-  xfprintf(stderr, "Set syntax %o %c = %04X\n",
-          ch, isprint(ch) ? ch : '-',
-          syntax->table[ch]);
+  xfprintf(stderr, "Set syntax %o %c = %04X\n", c, isprint(c) ? c : '-',
+          syntax->table[c]);
 #endif

-  return syntax->table[ch];
+  return syntax->table[c];
 }

 static int
-remove_syntax_attribute (m4_syntax_table *syntax, int ch, int code)
+remove_syntax_attribute (m4_syntax_table *syntax, char ch, int code)
 {
+  int c = to_uchar (ch);
   assert (code & M4_SYNTAX_MASKS);
-  syntax->table[ch] &= ~code;
+  syntax->table[c] &= ~code;
+  syntax->suspect = true;

 #ifdef DEBUG_SYNTAX
-  xfprintf(stderr, "Unset syntax %o %c = %04X\n",
-          ch, isprint(ch) ? ch : '-',
-          syntax->table[ch]);
+  xfprintf(stderr, "Unset syntax %o %c = %04X\n", c, isprint(c) ? c : '-',
+          syntax->table[c]);
 #endif

-  return syntax->table[ch];
+  return syntax->table[c];
 }

 /* Add the set CHARS of length LEN to syntax category CODE, removing
@@ -254,21 +273,8 @@ static void
 add_syntax_set (m4_syntax_table *syntax, const char *chars, size_t len,
                int code)
 {
-  int ch;
-
-  if (!len)
-    return;
-
-  if (code == M4_SYNTAX_ESCAPE)
-    syntax->is_macro_escaped = true;
-
-  /* Adding doesn't affect single-quote or single-comment.  */
-
   while (len--)
-    {
-      ch = to_uchar (*chars++);
-      add_syntax_attribute (syntax, ch, code);
-    }
+    add_syntax_attribute (syntax, *chars++, code);
 }

 /* Remove the set CHARS of length LEN from syntax category CODE,
@@ -277,43 +283,14 @@ static void
 subtract_syntax_set (m4_syntax_table *syntax, const char *chars, size_t len,
                     int code)
 {
-  int ch;
-
-  if (!len)
-    return;
-
   while (len--)
     {
-      ch = to_uchar (*chars++);
+      char ch = *chars++;
       if ((code & M4_SYNTAX_MASKS) != 0)
        remove_syntax_attribute (syntax, ch, code);
       else if (m4_has_syntax (syntax, ch, code))
        add_syntax_attribute (syntax, ch, M4_SYNTAX_OTHER);
     }
-
-  /* Check for any cleanup needed.  */
-  switch (code)
-    {
-    case M4_SYNTAX_ESCAPE:
-      if (syntax->is_macro_escaped)
-       check_is_macro_escaped (syntax);
-      break;
-
-    case M4_SYNTAX_LQUOTE:
-    case M4_SYNTAX_RQUOTE:
-      if (syntax->is_single_quotes)
-       check_is_single_quotes (syntax);
-      break;
-
-    case M4_SYNTAX_BCOMM:
-    case M4_SYNTAX_ECOMM:
-      if (syntax->is_single_comments)
-       check_is_single_comments (syntax);
-      break;
-
-    default:
-      break;
-    }
 }

 /* Make the set CHARS of length LEN become syntax category CODE,
@@ -330,21 +307,16 @@ set_syntax_set (m4_syntax_table *syntax, const char 
*chars, size_t len,
      OTHER.  */
   for (ch = UCHAR_MAX + 1; --ch >= 0; )
     {
-      if (code == M4_SYNTAX_RQUOTE || code == M4_SYNTAX_ECOMM)
+      if ((code & M4_SYNTAX_MASKS) != 0)
        remove_syntax_attribute (syntax, ch, code);
       else if (m4_has_syntax (syntax, ch, code))
        add_syntax_attribute (syntax, ch, M4_SYNTAX_OTHER);
     }
   while (len--)
     {
-      ch = to_uchar (*chars++);
+      ch = *chars++;
       add_syntax_attribute (syntax, ch, code);
     }
-
-  /* Check for any cleanup needed.  */
-  check_is_macro_escaped (syntax);
-  check_is_single_quotes (syntax);
-  check_is_single_comments (syntax);
 }

 /* Reset syntax category CODE to its default state, sending all other
@@ -375,9 +347,6 @@ reset_syntax_set (m4_syntax_table *syntax, int code)
       else if (syntax->orig[ch] == code || m4_has_syntax (syntax, ch, code))
        add_syntax_attribute (syntax, ch, syntax->orig[ch]);
     }
-  check_is_macro_escaped (syntax);
-  check_is_single_quotes (syntax);
-  check_is_single_comments (syntax);
 }

 /* Reset the syntax table to its default state.  */
@@ -403,10 +372,8 @@ m4_reset_syntax (m4_syntax_table *syntax)
   syntax->comm.str2 = xmemdup0 (DEF_ECOMM, 1);
   syntax->comm.len2 = 1;

-  add_syntax_attribute (syntax, to_uchar (syntax->quote.str2[0]),
-                       M4_SYNTAX_RQUOTE);
-  add_syntax_attribute (syntax, to_uchar (syntax->comm.str2[0]),
-                       M4_SYNTAX_ECOMM);
+  add_syntax_attribute (syntax, syntax->quote.str2[0], M4_SYNTAX_RQUOTE);
+  add_syntax_attribute (syntax, syntax->comm.str2[0], M4_SYNTAX_ECOMM);

   syntax->is_single_quotes = true;
   syntax->is_single_comments = true;
@@ -431,6 +398,7 @@ m4_set_syntax (m4_syntax_table *syntax, char key, char 
action,
     {
       return -1;
     }
+  syntax->suspect = false;
   switch (action)
     {
     case '+':
@@ -449,134 +417,159 @@ m4_set_syntax (m4_syntax_table *syntax, char key, char 
action,
     default:
       assert (false);
     }
-  set_quote_age (syntax, false, true);
-  m4__quote_uncache (syntax);
-  return code;
-}

-static bool
-check_is_single_quotes (m4_syntax_table *syntax)
-{
-  int ch;
-  int lquote = -1;
-  int rquote = -1;
-
-  if (! syntax->is_single_quotes)
-    return false;
-  assert (syntax->quote.len1 == 1 && syntax->quote.len2 == 1);
-
-  if (m4_has_syntax (syntax, *syntax->quote.str1, M4_SYNTAX_LQUOTE)
-      && m4_has_syntax (syntax, *syntax->quote.str2, M4_SYNTAX_RQUOTE))
-    return true;
-
-  /* The most recent action invalidated our current lquote/rquote.  If
-     we still have exactly one character performing those roles based
-     on the syntax table, then update lquote/rquote accordingly.
-     Otherwise, keep lquote/rquote, but we no longer have single
-     quotes.  */
-  for (ch = UCHAR_MAX + 1; --ch >= 0; )
+  /* Check for any cleanup needed.  */
+  if (syntax->suspect)
     {
-      if (m4_has_syntax (syntax, ch, M4_SYNTAX_LQUOTE))
+      int ch;
+      int lquote = -1;
+      int rquote = -1;
+      int bcomm = -1;
+      int ecomm = -1;
+      if (m4_has_syntax (syntax, syntax->quote.str1[0], M4_SYNTAX_LQUOTE))
        {
-         if (lquote == -1)
-           lquote = ch;
-         else
+         assert (syntax->quote.len1 == 1);
+         lquote = to_uchar (syntax->quote.str1[0]);
+       }
+      if (m4_has_syntax (syntax, syntax->quote.str2[0], M4_SYNTAX_RQUOTE))
+       {
+         assert (syntax->quote.len2 == 1);
+         rquote = to_uchar (syntax->quote.str2[0]);
+       }
+      if (m4_has_syntax (syntax, syntax->comm.str1[0], M4_SYNTAX_BCOMM))
+       {
+         assert (syntax->comm.len1 == 1);
+         bcomm = to_uchar (syntax->comm.str1[0]);
+       }
+      if (m4_has_syntax (syntax, syntax->comm.str2[0], M4_SYNTAX_ECOMM))
+       {
+         assert (syntax->comm.len2 == 1);
+         ecomm = to_uchar (syntax->comm.str2[0]);
+       }
+      syntax->is_macro_escaped = false;
+      /* Find candidates for each category.  */
+      for (ch = UCHAR_MAX + 1; --ch >= 0; )
+       {
+         if (m4_has_syntax (syntax, ch, M4_SYNTAX_LQUOTE))
+           {
+             if (lquote == -1)
+               lquote = ch;
+             else if (lquote != ch)
+               syntax->is_single_quotes = false;
+           }
+         if (m4_has_syntax (syntax, ch, M4_SYNTAX_RQUOTE))
+           {
+             if (rquote == -1)
+               rquote = ch;
+             else if (rquote != ch)
+               syntax->is_single_quotes = false;
+           }
+         if (m4_has_syntax (syntax, ch, M4_SYNTAX_BCOMM))
+           {
+             if (bcomm == -1)
+               bcomm = ch;
+             else if (bcomm != ch)
+               syntax->is_single_comments = false;
+           }
+         if (m4_has_syntax (syntax, ch, M4_SYNTAX_ECOMM))
            {
-             syntax->is_single_quotes = false;
-             break;
+             if (ecomm == -1)
+               ecomm = ch;
+             else if (ecomm != ch)
+               syntax->is_single_comments = false;
            }
+         if (m4_has_syntax (syntax, ch, M4_SYNTAX_ESCAPE))
+           syntax->is_macro_escaped = true;
        }
-      if (m4_has_syntax (syntax, ch, M4_SYNTAX_RQUOTE))
+      /* Disable multi-character delimiters if we discovered
+        delimiters.  */
+      if ((1 < syntax->quote.len1 || 1 < syntax->quote.len2)
+         && (!syntax->is_single_quotes || lquote != -1 || rquote != -1))
        {
-         if (rquote == -1)
-           rquote = ch;
-         else
+         if (syntax->quote.len1)
+           {
+             syntax->quote.len1 = lquote == to_uchar (syntax->quote.str1[0]);
+             syntax->quote.str1[syntax->quote.len1] = '\0';
+           }
+         if (syntax->quote.len2)
            {
-             syntax->is_single_quotes = false;
-             break;
+             syntax->quote.len2 = rquote == to_uchar (syntax->quote.str2[0]);
+             syntax->quote.str2[syntax->quote.len2] = '\0';
            }
        }
-    }
-  if (lquote == -1 || rquote == -1)
-    syntax->is_single_quotes = false;
-  else if (syntax->is_single_quotes)
-    {
-      *syntax->quote.str1 = lquote;
-      *syntax->quote.str2 = rquote;
-    }
-  return syntax->is_single_quotes;
-}
-
-static bool
-check_is_single_comments (m4_syntax_table *syntax)
-{
-  int ch;
-  int bcomm = -1;
-  int ecomm = -1;
-
-  if (! syntax->is_single_comments)
-    return false;
-  assert (syntax->comm.len1 == 1 && syntax->comm.len2 == 1);
-
-  if (m4_has_syntax (syntax, *syntax->comm.str1, M4_SYNTAX_BCOMM)
-      && m4_has_syntax (syntax, *syntax->comm.str2, M4_SYNTAX_ECOMM))
-    return true;
-
-  /* The most recent action invalidated our current bcomm/ecomm.  If
-     we still have exactly one character performing those roles based
-     on the syntax table, then update bcomm/ecomm accordingly.
-     Otherwise, keep bcomm/ecomm, but we no longer have single
-     comments.  */
-  for (ch = UCHAR_MAX + 1; --ch >= 0; )
-    {
-      if (m4_has_syntax (syntax, ch, M4_SYNTAX_BCOMM))
+      if ((1 < syntax->comm.len1 || 1 < syntax->comm.len2)
+         && (!syntax->is_single_comments || bcomm != -1 || ecomm != -1))
+       {
+         if (syntax->comm.len1)
+           {
+             syntax->comm.len1 = bcomm == to_uchar (syntax->comm.str1[0]);
+             syntax->comm.str1[syntax->comm.len1] = '\0';
+           }
+         if (syntax->comm.len2)
+           {
+             syntax->comm.len2 = ecomm == to_uchar (syntax->comm.str2[0]);
+             syntax->comm.str2[syntax->comm.len2] = '\0';
+           }
+       }
+      /* Update the strings.  */
+      if (lquote != -1)
        {
-         if (bcomm == -1)
-           bcomm = ch;
+         if (syntax->quote.len1)
+           assert (syntax->quote.len1 == 1);
          else
            {
-             syntax->is_single_comments = false;
-             break;
+             free (syntax->quote.str1);
+             syntax->quote.str1 = xcharalloc (2);
+             syntax->quote.str1[1] = '\0';
+             syntax->quote.len1 = 1;
            }
+         syntax->quote.str1[0] = lquote;
+         if (rquote == -1)
+           {
+             rquote = '\'';
+             add_syntax_attribute (syntax, rquote, M4_SYNTAX_RQUOTE);
+           }
+         if (!syntax->quote.len2)
+           {
+             free (syntax->quote.str2);
+             syntax->quote.str2 = xcharalloc (2);
+           }
+         syntax->quote.str2[0] = rquote;
+         syntax->quote.str2[1] = '\0';
+         syntax->quote.len2 = 1;
        }
-      if (m4_has_syntax (syntax, ch, M4_SYNTAX_ECOMM))
+      if (bcomm != -1)
        {
-         if (ecomm == -1)
-           ecomm = ch;
+         if (syntax->comm.len1)
+           assert (syntax->comm.len1 == 1);
          else
            {
-             syntax->is_single_comments = false;
-             break;
+             free (syntax->comm.str1);
+             syntax->comm.str1 = xcharalloc (2);
+             syntax->comm.str1[1] = '\0';
+             syntax->comm.len1 = 1;
            }
+         syntax->comm.str1[0] = bcomm;
+         if (ecomm == -1)
+           {
+             ecomm = '\n';
+             add_syntax_attribute (syntax, ecomm, M4_SYNTAX_ECOMM);
+           }
+         if (!syntax->comm.len2)
+           {
+             free (syntax->comm.str2);
+             syntax->comm.str2 = xcharalloc (2);
+           }
+         syntax->comm.str2[0] = ecomm;
+         syntax->comm.str2[1] = '\0';
+         syntax->comm.len2 = 1;
        }
     }
-  if (bcomm == -1 || ecomm == -1)
-    syntax->is_single_comments = false;
-  else if (syntax->is_single_comments)
-    {
-      *syntax->comm.str1 = bcomm;
-      *syntax->comm.str2 = ecomm;
-    }
-  return syntax->is_single_comments;
-}
-
-static bool
-check_is_macro_escaped (m4_syntax_table *syntax)
-{
-  int ch;
-
-  syntax->is_macro_escaped = false;
-  for (ch = UCHAR_MAX + 1; --ch >= 0; )
-    if (m4_has_syntax (syntax, ch, M4_SYNTAX_ESCAPE))
-      {
-       syntax->is_macro_escaped = true;
-       break;
-      }
-
-  return syntax->is_macro_escaped;
+  set_quote_age (syntax, false, true);
+  m4__quote_uncache (syntax);
+  return code;
 }

-
 
 /* Functions for setting quotes and comment delimiters.  Used by
    m4_changecom () and m4_changequote ().  Both functions override the
@@ -629,13 +622,11 @@ m4_set_quotes (m4_syntax_table *syntax, const char *lq, 
size_t lq_len,
   /* changequote overrides syntax_table, but be careful when it is
      used to select a start-quote sequence that is effectively
      disabled.  */
-
-  syntax->is_single_quotes
-    = (syntax->quote.len1 == 1 && syntax->quote.len2 == 1
-       && !m4_has_syntax (syntax, *syntax->quote.str1,
-                         (M4_SYNTAX_IGNORE | M4_SYNTAX_ESCAPE
-                          | M4_SYNTAX_ALPHA | M4_SYNTAX_NUM)));
-
+  syntax->is_single_quotes = !m4_has_syntax (syntax, *syntax->quote.str1,
+                                            (M4_SYNTAX_IGNORE
+                                             | M4_SYNTAX_ESCAPE
+                                             | M4_SYNTAX_ALPHA
+                                             | M4_SYNTAX_NUM));
   for (ch = UCHAR_MAX + 1; --ch >= 0; )
     {
       if (m4_has_syntax (syntax, ch, M4_SYNTAX_LQUOTE))
@@ -646,15 +637,12 @@ m4_set_quotes (m4_syntax_table *syntax, const char *lq, 
size_t lq_len,
        remove_syntax_attribute (syntax, ch, M4_SYNTAX_RQUOTE);
     }

-  if (syntax->is_single_quotes)
+  if (syntax->is_single_quotes
+      && syntax->quote.len1 == 1 && syntax->quote.len2 == 1)
     {
-      add_syntax_attribute (syntax, to_uchar (syntax->quote.str1[0]),
-                           M4_SYNTAX_LQUOTE);
-      add_syntax_attribute (syntax, to_uchar (syntax->quote.str2[0]),
-                           M4_SYNTAX_RQUOTE);
+      add_syntax_attribute (syntax, syntax->quote.str1[0], M4_SYNTAX_LQUOTE);
+      add_syntax_attribute (syntax, syntax->quote.str2[0], M4_SYNTAX_RQUOTE);
     }
-  if (syntax->is_macro_escaped)
-    check_is_macro_escaped (syntax);
   set_quote_age (syntax, false, false);
 }

@@ -703,14 +691,12 @@ m4_set_comment (m4_syntax_table *syntax, const char *bc, 
size_t bc_len,
   /* changecom overrides syntax_table, but be careful when it is used
      to select a start-comment sequence that is effectively
      disabled.  */
-
-  syntax->is_single_comments
-    = (syntax->comm.len1 == 1 && syntax->comm.len2 == 1
-       && !m4_has_syntax (syntax, *syntax->comm.str1,
-                         (M4_SYNTAX_IGNORE | M4_SYNTAX_ESCAPE
-                          | M4_SYNTAX_ALPHA | M4_SYNTAX_NUM
-                          | M4_SYNTAX_LQUOTE)));
-
+  syntax->is_single_comments = !m4_has_syntax (syntax, *syntax->comm.str1,
+                                              (M4_SYNTAX_IGNORE
+                                               | M4_SYNTAX_ESCAPE
+                                               | M4_SYNTAX_ALPHA
+                                               | M4_SYNTAX_NUM
+                                               | M4_SYNTAX_LQUOTE));
   for (ch = UCHAR_MAX + 1; --ch >= 0; )
     {
       if (m4_has_syntax (syntax, ch, M4_SYNTAX_BCOMM))
@@ -720,20 +706,17 @@ m4_set_comment (m4_syntax_table *syntax, const char *bc, 
size_t bc_len,
       if (m4_has_syntax (syntax, ch, M4_SYNTAX_ECOMM))
        remove_syntax_attribute (syntax, ch, M4_SYNTAX_ECOMM);
     }
-  if (syntax->is_single_comments)
+  if (syntax->is_single_comments
+      && syntax->comm.len1 == 1 && syntax->comm.len2 == 1)
     {
-      add_syntax_attribute (syntax, to_uchar (syntax->comm.str1[0]),
-                           M4_SYNTAX_BCOMM);
-      add_syntax_attribute (syntax, to_uchar (syntax->comm.str2[0]),
-                           M4_SYNTAX_ECOMM);
+      add_syntax_attribute (syntax, syntax->comm.str1[0], M4_SYNTAX_BCOMM);
+      add_syntax_attribute (syntax, syntax->comm.str2[0], M4_SYNTAX_ECOMM);
     }
-  if (syntax->is_macro_escaped)
-    check_is_macro_escaped (syntax);
   set_quote_age (syntax, false, false);
 }

 /* Call this when changing anything that might impact the quote age,
-   so that m4_quote_age and m4_safe_quotes will reflect the change.
+   so that m4__quote_age and m4__safe_quotes will reflect the change.
    If RESET, changesyntax was reset to its default stage; if CHANGE,
    arbitrary syntax has changed; otherwise, just quotes or comment
    delimiters have changed.  */
@@ -789,6 +772,7 @@ set_quote_age (m4_syntax_table *syntax, bool reset, bool 
change)
   else
     local_syntax_age = syntax->syntax_age;
   if (local_syntax_age < 0xffff && syntax->is_single_quotes
+      && syntax->quote.len1 == 1 && syntax->quote.len2 == 1
       && !m4_has_syntax (syntax, *syntax->quote.str1,
                         (M4_SYNTAX_ALPHA | M4_SYNTAX_NUM | M4_SYNTAX_OPEN
                          | M4_SYNTAX_COMMA | M4_SYNTAX_CLOSE
-- 
1.6.1.2


>From 267f56e7699f2e506cc977fc4c96b4dea6626fd4 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Mon, 16 Feb 2009 07:02:03 -0700
Subject: [PATCH 3/3] Unify single and multi-character delimiter handling.

* m4/input.c (MATCH): Add a parameter.
(m4__next_token): Simplify logic and reduce redundancy.
(m4__next_token_is_open): Adjust caller.
* m4/syntax.c (m4_set_comment, m4_set_quotes): Handle delimiters
of differing lengths.
(m4_set_syntax): Recognize restoration of single delimiters.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog   |    8 +++
 m4/input.c  |  178 +++++++++++++++++-----------------------------------------
 m4/syntax.c |   53 ++++++++++--------
 3 files changed, 90 insertions(+), 149 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 8f77619..726fdc8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,13 @@
 2009-02-16  Eric Blake  <address@hidden>

+       Unify single and multi-character delimiter handling.
+       * m4/input.c (MATCH): Add a parameter.
+       (m4__next_token): Simplify logic and reduce redundancy.
+       (m4__next_token_is_open): Adjust caller.
+       * m4/syntax.c (m4_set_comment, m4_set_quotes): Handle delimiters
+       of differing lengths.
+       (m4_set_syntax): Recognize restoration of single delimiters.
+
        Revamp changesyntax vs. changequote interactions.
        * m4/m4module.h (M4_SYNTAX_VALUE): Delete unused macro.
        (M4_SYNTAX_SUSPECT): New macro.
diff --git a/m4/input.c b/m4/input.c
index 6c761d0..dd3addc 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -1417,9 +1417,10 @@ match_input (m4 *context, const char *s, size_t len, 
bool consume)
   return result;
 }

-/* The macro MATCH() is used to match a string S of length LEN against
-   the input.  The first character is handled inline for speed, and
-   S[LEN] must be safe to dereference (it is faster to do character
+/* Check whether the current input matches a delimiter, which either
+   belongs to syntax category CAT or matches the string S of length
+   LEN.  The first character is handled inline for speed, and S[LEN]
+   must be safe to dereference (it is faster to do character
    comparison prior to length checks).  This improves efficiency for
    the common case of single character quotes and comment delimiters,
    while being safe for disabled delimiters as well as longer
@@ -1427,9 +1428,10 @@ match_input (m4 *context, const char *s, size_t len, 
bool consume)
    successful match will discard the matched string.  Otherwise, CH is
    the result of peek_char, and the input stream is effectively
    unchanged.  */
-#define MATCH(C, ch, s, len, consume)                                  \
-  (to_uchar ((s)[0]) == (ch)                                           \
-   && ((len) >> 1 ? match_input (C, s, len, consume) : (len)))
+#define MATCH(C, ch, cat, s, len, consume)                             \
+  (m4_has_syntax (m4_get_syntax_table (C), ch, cat)                    \
+   || (to_uchar ((s)[0]) == (ch)                                       \
+       && ((len) >> 1 ? match_input (C, s, len, consume) : (len))))

 /* While the current input character has the given SYNTAX, append it
    to OBS.  Take care not to pop input source unless the next source
@@ -1600,8 +1602,10 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
        obstack_1grow (obs_safe, ch);
        consume_syntax (context, obs_safe, M4_SYNTAX_ALPHA | M4_SYNTAX_NUM);
       }
-    else if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_LQUOTE))
-      {                                        /* QUOTED STRING, SINGLE QUOTES 
*/
+    else if (MATCH (context, ch, M4_SYNTAX_LQUOTE,
+                   context->syntax->quote.str1,
+                   context->syntax->quote.len1, true))
+      {                                        /* QUOTED STRING */
        if (obs)
          obs_safe = obs;
        quote_level = 1;
@@ -1625,106 +1629,44 @@ m4__next_token (m4 *context, m4_symbol_value *token, 
int *line,
              init_builtin_token (context, obs, obs ? token : NULL);
            else if (ch == CHAR_QUOTE)
              append_quote_token (context, obs, token);
-           else if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_RQUOTE))
-             {
-               if (--quote_level == 0)
-                 break;
-               obstack_1grow (obs_safe, ch);
-             }
-           else if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_LQUOTE))
-             {
-               quote_level++;
-               obstack_1grow (obs_safe, ch);
-             }
-           else
-             obstack_1grow (obs_safe, ch);
-         }
-      }
-    else if (MATCH (context, ch, context->syntax->quote.str1,
-                   context->syntax->quote.len1, true))
-      {                                        /* QUOTED STRING, LONGER QUOTES 
*/
-       if (obs)
-         obs_safe = obs;
-       quote_level = 1;
-       type = M4_TOKEN_STRING;
-       assert (!m4__quote_age (M4SYNTAX));
-       while (1)
-         {
-           ch = next_char (context, false, false, false);
-           if (ch == CHAR_EOF)
-             {
-               if (!caller)
-                 {
-                   assert (line);
-                   m4_set_current_file (context, file);
-                   m4_set_current_line (context, *line);
-                 }
-               m4_error (context, EXIT_FAILURE, 0, caller,
-                         _("end of file in string"));
-             }
-           if (ch == CHAR_BUILTIN)
-             init_builtin_token (context, obs, obs ? token : NULL);
-           else if (MATCH (context, ch, context->syntax->quote.str2,
+           else if (MATCH (context, ch, M4_SYNTAX_RQUOTE,
+                           context->syntax->quote.str2,
                            context->syntax->quote.len2, true))
              {
                if (--quote_level == 0)
                  break;
-               obstack_grow (obs_safe, context->syntax->quote.str2,
-                             context->syntax->quote.len2);
+               if (1 < context->syntax->quote.len2)
+                 obstack_grow (obs_safe, context->syntax->quote.str2,
+                               context->syntax->quote.len2);
+               else
+                 obstack_1grow (obs_safe, ch);
              }
-           else if (MATCH (context, ch, context->syntax->quote.str1,
+           else if (MATCH (context, ch, M4_SYNTAX_LQUOTE,
+                           context->syntax->quote.str1,
                            context->syntax->quote.len1, true))
              {
                quote_level++;
-               obstack_grow (obs_safe, context->syntax->quote.str1,
-                             context->syntax->quote.len1);
+               if (1 < context->syntax->quote.len1)
+                 obstack_grow (obs_safe, context->syntax->quote.str1,
+                               context->syntax->quote.len1);
+               else
+                 obstack_1grow (obs_safe, ch);
              }
            else
              obstack_1grow (obs_safe, ch);
          }
       }
-    else if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_BCOMM))
-      {                                        /* COMMENT, SHORT DELIM */
-       if (obs && !m4_get_discard_comments_opt (context))
-         obs_safe = obs;
-       obstack_1grow (obs_safe, ch);
-       while (1)
-         {
-           ch = next_char (context, false, false, false);
-           if (ch == CHAR_EOF)
-             {
-               if (!caller)
-                 {
-                   assert (line);
-                   m4_set_current_file (context, file);
-                   m4_set_current_line (context, *line);
-                 }
-               m4_error (context, EXIT_FAILURE, 0, caller,
-                         _("end of file in comment"));
-             }
-           if (ch == CHAR_BUILTIN)
-             {
-               init_builtin_token (context, NULL, NULL);
-               continue;
-             }
-           if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_ECOMM))
-             {
-               obstack_1grow (obs_safe, ch);
-               break;
-             }
-           assert (ch < CHAR_EOF);
-           obstack_1grow (obs_safe, ch);
-         }
-       type = (m4_get_discard_comments_opt (context)
-               ? M4_TOKEN_NONE : M4_TOKEN_COMMENT);
-      }
-    else if (MATCH (context, ch, context->syntax->comm.str1,
+    else if (MATCH (context, ch, M4_SYNTAX_BCOMM,
+                   context->syntax->comm.str1,
                    context->syntax->comm.len1, true))
-      {                                        /* COMMENT, LONGER DELIM */
+      {                                        /* COMMENT */
        if (obs && !m4_get_discard_comments_opt (context))
          obs_safe = obs;
-       obstack_grow (obs_safe, context->syntax->comm.str1,
-                     context->syntax->comm.len1);
+       if (1 < context->syntax->comm.len1)
+         obstack_grow (obs_safe, context->syntax->comm.str1,
+                       context->syntax->comm.len1);
+       else
+         obstack_1grow (obs_safe, ch);
        while (1)
          {
            ch = next_char (context, false, false, false);
@@ -1744,11 +1686,15 @@ m4__next_token (m4 *context, m4_symbol_value *token, 
int *line,
                init_builtin_token (context, NULL, NULL);
                continue;
              }
-           if (MATCH (context, ch, context->syntax->comm.str2,
+           if (MATCH (context, ch, M4_SYNTAX_ECOMM,
+                      context->syntax->comm.str2,
                       context->syntax->comm.len2, true))
              {
-               obstack_grow (obs_safe, context->syntax->comm.str2,
-                             context->syntax->comm.len2);
+               if (1 < context->syntax->comm.len2)
+                 obstack_grow (obs_safe, context->syntax->comm.str2,
+                               context->syntax->comm.len2);
+               else
+                 obstack_1grow (obs_safe, ch);
                break;
              }
            assert (ch < CHAR_EOF);
@@ -1777,11 +1723,10 @@ m4__next_token (m4 *context, m4_symbol_value *token, 
int *line,
        obstack_1grow (&token_stack, ch);
        type = M4_TOKEN_CLOSE;
       }
-    else if (m4__safe_quotes (M4SYNTAX))
-      {                        /* EVERYTHING ELSE (SHORT QUOTES AND COMMENTS) 
*/
+    else
+      {                                        /* EVERYTHING ELSE */
        assert (ch < CHAR_EOF);
        obstack_1grow (&token_stack, ch);
-
        if (m4_has_syntax (M4SYNTAX, ch,
                           (M4_SYNTAX_OTHER | M4_SYNTAX_NUM | M4_SYNTAX_DOLLAR
                            | M4_SYNTAX_LBRACE | M4_SYNTAX_RBRACE)))
@@ -1791,10 +1736,11 @@ m4__next_token (m4 *context, m4_symbol_value *token, 
int *line,
                obs_safe = obs;
                obstack_1grow (obs, ch);
              }
-           consume_syntax (context, obs_safe,
-                           (M4_SYNTAX_OTHER | M4_SYNTAX_NUM
-                            | M4_SYNTAX_DOLLAR | M4_SYNTAX_LBRACE
-                            | M4_SYNTAX_RBRACE));
+           if (m4__safe_quotes (M4SYNTAX))
+             consume_syntax (context, obs_safe,
+                             (M4_SYNTAX_OTHER | M4_SYNTAX_NUM
+                              | M4_SYNTAX_DOLLAR | M4_SYNTAX_LBRACE
+                              | M4_SYNTAX_RBRACE));
            type = M4_TOKEN_STRING;
          }
        else if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_SPACE))
@@ -1802,34 +1748,14 @@ m4__next_token (m4 *context, m4_symbol_value *token, 
int *line,
            /* Coalescing newlines when interactive or when synclines
               are enabled is wrong.  */
            if (!m4_get_interactive_opt (context)
-               && !m4_get_syncoutput_opt (context))
+               && !m4_get_syncoutput_opt (context)
+               && m4__safe_quotes (M4SYNTAX))
              consume_syntax (context, &token_stack, M4_SYNTAX_SPACE);
            type = M4_TOKEN_SPACE;
          }
        else
          type = M4_TOKEN_SIMPLE;
       }
-    else               /* EVERYTHING ELSE (LONG QUOTES OR COMMENTS) */
-      {
-       assert (ch < CHAR_EOF);
-       obstack_1grow (&token_stack, ch);
-
-       if (m4_has_syntax (M4SYNTAX, ch,
-                          (M4_SYNTAX_OTHER | M4_SYNTAX_NUM | M4_SYNTAX_DOLLAR
-                           | M4_SYNTAX_LBRACE | M4_SYNTAX_RBRACE)))
-         {
-           if (obs)
-             {
-               obs_safe = obs;
-               obstack_1grow (obs, ch);
-             }
-           type = M4_TOKEN_STRING;
-         }
-       else if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_SPACE))
-         type = M4_TOKEN_SPACE;
-       else
-         type = M4_TOKEN_SIMPLE;
-      }
   } while (type == M4_TOKEN_NONE);

   if (token->type == M4_SYMBOL_VOID)
@@ -1879,9 +1805,9 @@ m4__next_token_is_open (m4 *context)
       || m4_has_syntax (M4SYNTAX, ch, (M4_SYNTAX_BCOMM | M4_SYNTAX_ESCAPE
                                       | M4_SYNTAX_ALPHA | M4_SYNTAX_LQUOTE
                                       | M4_SYNTAX_ACTIVE))
-      || (MATCH (context, ch, context->syntax->comm.str1,
+      || (MATCH (context, ch, M4_SYNTAX_BCOMM, context->syntax->comm.str1,
                 context->syntax->comm.len1, false))
-      || (MATCH (context, ch, context->syntax->quote.str1,
+      || (MATCH (context, ch, M4_SYNTAX_LQUOTE, context->syntax->quote.str1,
                 context->syntax->quote.len1, false)))
     return false;
   return m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_OPEN);
diff --git a/m4/syntax.c b/m4/syntax.c
index 213d790..0949055 100644
--- a/m4/syntax.c
+++ b/m4/syntax.c
@@ -426,6 +426,8 @@ m4_set_syntax (m4_syntax_table *syntax, char key, char 
action,
       int rquote = -1;
       int bcomm = -1;
       int ecomm = -1;
+      bool single_quote_possible = true;
+      bool single_comm_possible = true;
       if (m4_has_syntax (syntax, syntax->quote.str1[0], M4_SYNTAX_LQUOTE))
        {
          assert (syntax->quote.len1 == 1);
@@ -455,34 +457,38 @@ m4_set_syntax (m4_syntax_table *syntax, char key, char 
action,
              if (lquote == -1)
                lquote = ch;
              else if (lquote != ch)
-               syntax->is_single_quotes = false;
+               single_quote_possible = false;
            }
          if (m4_has_syntax (syntax, ch, M4_SYNTAX_RQUOTE))
            {
              if (rquote == -1)
                rquote = ch;
              else if (rquote != ch)
-               syntax->is_single_quotes = false;
+               single_quote_possible = false;
            }
          if (m4_has_syntax (syntax, ch, M4_SYNTAX_BCOMM))
            {
              if (bcomm == -1)
                bcomm = ch;
              else if (bcomm != ch)
-               syntax->is_single_comments = false;
+               single_comm_possible = false;
            }
          if (m4_has_syntax (syntax, ch, M4_SYNTAX_ECOMM))
            {
              if (ecomm == -1)
                ecomm = ch;
              else if (ecomm != ch)
-               syntax->is_single_comments = false;
+               single_comm_possible = false;
            }
          if (m4_has_syntax (syntax, ch, M4_SYNTAX_ESCAPE))
            syntax->is_macro_escaped = true;
        }
       /* Disable multi-character delimiters if we discovered
         delimiters.  */
+      if (!single_quote_possible)
+       syntax->is_single_quotes = false;
+      if (!single_comm_possible)
+       syntax->is_single_comments = false;
       if ((1 < syntax->quote.len1 || 1 < syntax->quote.len2)
          && (!syntax->is_single_quotes || lquote != -1 || rquote != -1))
        {
@@ -514,6 +520,8 @@ m4_set_syntax (m4_syntax_table *syntax, char key, char 
action,
       /* Update the strings.  */
       if (lquote != -1)
        {
+         if (single_quote_possible)
+           syntax->is_single_quotes = true;
          if (syntax->quote.len1)
            assert (syntax->quote.len1 == 1);
          else
@@ -540,6 +548,8 @@ m4_set_syntax (m4_syntax_table *syntax, char key, char 
action,
        }
       if (bcomm != -1)
        {
+         if (single_comm_possible)
+           syntax->is_single_comments = true;
          if (syntax->comm.len1)
            assert (syntax->comm.len1 == 1);
          else
@@ -622,11 +632,7 @@ m4_set_quotes (m4_syntax_table *syntax, const char *lq, 
size_t lq_len,
   /* changequote overrides syntax_table, but be careful when it is
      used to select a start-quote sequence that is effectively
      disabled.  */
-  syntax->is_single_quotes = !m4_has_syntax (syntax, *syntax->quote.str1,
-                                            (M4_SYNTAX_IGNORE
-                                             | M4_SYNTAX_ESCAPE
-                                             | M4_SYNTAX_ALPHA
-                                             | M4_SYNTAX_NUM));
+  syntax->is_single_quotes = true;
   for (ch = UCHAR_MAX + 1; --ch >= 0; )
     {
       if (m4_has_syntax (syntax, ch, M4_SYNTAX_LQUOTE))
@@ -637,11 +643,14 @@ m4_set_quotes (m4_syntax_table *syntax, const char *lq, 
size_t lq_len,
        remove_syntax_attribute (syntax, ch, M4_SYNTAX_RQUOTE);
     }

-  if (syntax->is_single_quotes
-      && syntax->quote.len1 == 1 && syntax->quote.len2 == 1)
+  if (!m4_has_syntax (syntax, *syntax->quote.str1,
+                     (M4_SYNTAX_IGNORE | M4_SYNTAX_ESCAPE | M4_SYNTAX_ALPHA
+                      | M4_SYNTAX_NUM)))
     {
-      add_syntax_attribute (syntax, syntax->quote.str1[0], M4_SYNTAX_LQUOTE);
-      add_syntax_attribute (syntax, syntax->quote.str2[0], M4_SYNTAX_RQUOTE);
+      if (syntax->quote.len1 == 1)
+       add_syntax_attribute (syntax, syntax->quote.str1[0], M4_SYNTAX_LQUOTE);
+      if (syntax->quote.len2 == 1)
+       add_syntax_attribute (syntax, syntax->quote.str2[0], M4_SYNTAX_RQUOTE);
     }
   set_quote_age (syntax, false, false);
 }
@@ -691,12 +700,7 @@ m4_set_comment (m4_syntax_table *syntax, const char *bc, 
size_t bc_len,
   /* changecom overrides syntax_table, but be careful when it is used
      to select a start-comment sequence that is effectively
      disabled.  */
-  syntax->is_single_comments = !m4_has_syntax (syntax, *syntax->comm.str1,
-                                              (M4_SYNTAX_IGNORE
-                                               | M4_SYNTAX_ESCAPE
-                                               | M4_SYNTAX_ALPHA
-                                               | M4_SYNTAX_NUM
-                                               | M4_SYNTAX_LQUOTE));
+  syntax->is_single_comments = true;
   for (ch = UCHAR_MAX + 1; --ch >= 0; )
     {
       if (m4_has_syntax (syntax, ch, M4_SYNTAX_BCOMM))
@@ -706,11 +710,14 @@ m4_set_comment (m4_syntax_table *syntax, const char *bc, 
size_t bc_len,
       if (m4_has_syntax (syntax, ch, M4_SYNTAX_ECOMM))
        remove_syntax_attribute (syntax, ch, M4_SYNTAX_ECOMM);
     }
-  if (syntax->is_single_comments
-      && syntax->comm.len1 == 1 && syntax->comm.len2 == 1)
+  if (!m4_has_syntax (syntax, *syntax->comm.str1,
+                     (M4_SYNTAX_IGNORE | M4_SYNTAX_ESCAPE | M4_SYNTAX_ALPHA
+                      | M4_SYNTAX_NUM | M4_SYNTAX_LQUOTE)))
     {
-      add_syntax_attribute (syntax, syntax->comm.str1[0], M4_SYNTAX_BCOMM);
-      add_syntax_attribute (syntax, syntax->comm.str2[0], M4_SYNTAX_ECOMM);
+      if (syntax->comm.len1 == 1)
+       add_syntax_attribute (syntax, syntax->comm.str1[0], M4_SYNTAX_BCOMM);
+      if (syntax->comm.len2 == 1)
+       add_syntax_attribute (syntax, syntax->comm.str2[0], M4_SYNTAX_ECOMM);
     }
   set_quote_age (syntax, false, false);
 }
-- 
1.6.1.2
[Prev in Thread]
Current Thread
[Next in Thread]
speed up input parsing, Eric Blake, 2009/02/13
- Re: speed up input parsing, Jim Meyering, 2009/02/14
- Re: speed up input parsing, Eric Blake, 2009/02/16
- Re: speed up input parsing, Eric Blake <=
  - Re: speed up input parsing, Eric Blake, 2009/02/18
Prev by Date: Re: speed up input parsing
Next by Date: Re: argv_ref patch 28: handle NUL in warning messages
Previous by thread: Re: speed up input parsing
Next by thread: Re: speed up input parsing
Index(es):
- Date
- Thread