m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[5/18] argv_ref speedup: add notion of quote age


From: Eric Blake
Subject: [5/18] argv_ref speedup: add notion of quote age
Date: Fri, 07 Dec 2007 07:14:41 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071031 Thunderbird/2.0.0.9 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The next in the series.  I hope I got everything correct on the head - the
addition of changesyntax made porting this patch more interesting, because
there were more corner cases to think about.  In general, this patch adds
some timing overhead, and has little impact on memory (on head, it
actually uses slightly less memory).

The idea behind this patch is as follows.  Since m4 allows redefining the
behavior of the input engine on the fly (changequote, changecom,
changeword, changesyntax), we must ensure correct reparse in those corner
cases.  Our goal is to reparse text as few times as possible - once text
has been parsed, we know what quoting rules it follows, so we cache that
information alongside each string of parsed text.  Then when reparsing, we
can see if the quote age remains the same, in which case we don't have to
waste time parsing that string, but can use the entire string as is.
Setting quote_age to be hardcoded to 0 will produce the same results, but
without the benefit of reparsing.

2007-12-07  Eric Blake  <address@hidden>

        Stage 5: add notion of quote age.
        * src/input.c: Comment cleanups.
        (current_quote_age): New global variable.
        (set_quote_age): New helper function.
        (input_init, set_word_regexp): Use it.
        (set_quotes, set_comment): Likewise, and detect no-op changes.
        (quote_age, safe_quotes): New functions.
        (next_token): Track quote age.
        * src/m4.h (struct token_data): Add quote_age member.
        (TOKEN_DATA_QUOTE_AGE, quote_age, safe_quotes): New prototypes.
        * src/macro.c (struct macro_arguments): Add quote_age member.
        (expand_token): Alter signature and track quote age.
        (expand_input, expand_argument): All callers changed.
        (collect_arguments, make_argv_ref): Track quote age.
        (arg_text, arg_len, arg_func): Detect type mismatch.
        * doc/m4.texinfo (Ifelse, Changequote): Add more tests.
        (Incompatibilities): Fix typo.
        * examples/wraplifo.m4: New file.
        * examples/Makefile.am (EXTRA_DIST): Distribute it.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHWVVR84KuGfSFAYARAvCzAKC6ujqG/CtZVXsxdcBk6GTHWd0JGwCdFVI4
Djsbn1z2v00OGE0jowYXf2k=
=Q3TT
-----END PGP SIGNATURE-----
>From bcb92cf23ae09a5e3a0c8f5f8d2a991245918ede Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Thu, 6 Dec 2007 22:14:22 -0700
Subject: [PATCH] Stage 5: add notion of quote age.

* m4/m4module.h (m4_get_symbol_value_quote_age): New prototype.
(m4_set_symbol_value_text): Adjust prototype.
(m4_has_syntax): Factor out the unsigned char cast.
* m4/m4private.h (struct m4_syntax_table): Add syntax_age and
quote_age members.
(m4__quote_age, m4__safe_quotes): New accessor macros, no need for
functions at this point.
(struct m4_symbol_value, struct m4_macro_args): Add quote_age
member.
(m4_set_symbol_value_text): Adjust fast accessor.
(m4_get_symbol_value_quote_age): New fast accessor.
* m4/symtab.c (m4_set_symbol_value_text): Add parameter.
(m4_get_symbol_value_quote_age): New function.
(m4_symbol_value_copy): Adjust callers.
* m4/macro.c (expand_token): Add parameter, and track quote age.
(expand_argument, collect_arguments): Track quote age.
(m4_macro_expand_input, process_macro, m4_make_argv_ref)
(m4_macro_expand_input): Update callers.
(m4_arg_text, m4_arg_len, m4_arg_func): Abort on type mismatch.
* m4/input.c: Comment cleanups.
(struct m4_input_block): Reduce size.
(m4__next_token): Report quote age.
(m4_push_builtin, init_builtin_token): Update callers.
* m4/utility.c (skip_space): Adjust callers.
* m4/module.c (install_macro_table): Likewise.
* m4/syntax.c (m4_set_syntax): Initialize and update quote age.
(m4_set_quotes, m4_set_comment): Detect no-op changes, and update
quote age.
(set_quote_age): New helper function.
(check_is_single_quotes, check_is_single_comments): Adjust
callers.
* src/freeze.c (reload_frozen_state): Likewise.
* src/main.c (main): Likewise.
* modules/m4.c (define, pushdef): No need to set macro text.
* tests/builtins.at (changequote, defn): New tests.
* examples/wrapfifo.m4: New file.
* examples/wraplifo.m4: New file.
* Makefile.am (dist_pkgdata_DATA): Distribute new examples.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog            |   42 +++++++++++++
 examples/wrapfifo.m4 |   10 +++
 examples/wraplifo.m4 |   10 +++
 m4/input.c           |  157 +++++++++++++++++++++++++-------------------------
 m4/m4module.h        |   18 +++++-
 m4/m4private.h       |   53 +++++++++++++----
 m4/macro.c           |  124 ++++++++++++++++++++++++++--------------
 m4/module.c          |    3 +-
 m4/symtab.c          |   15 ++++-
 m4/syntax.c          |  122 +++++++++++++++++++++++++++++++-------
 m4/utility.c         |    2 +-
 modules/m4.c         |   12 +---
 src/freeze.c         |    2 +-
 src/main.c           |    2 +-
 tests/builtins.at    |   71 ++++++++++++++++++++++
 15 files changed, 466 insertions(+), 177 deletions(-)
 create mode 100644 examples/wrapfifo.m4
 create mode 100644 examples/wraplifo.m4

diff --git a/ChangeLog b/ChangeLog
index af39595..247eb37 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,45 @@
+2007-12-07  Eric Blake  <address@hidden>
+
+       Stage 5: add notion of quote age.
+       * m4/m4module.h (m4_get_symbol_value_quote_age): New prototype.
+       (m4_set_symbol_value_text): Adjust prototype.
+       (m4_has_syntax): Factor out the unsigned char cast.
+       * m4/m4private.h (struct m4_syntax_table): Add syntax_age and
+       quote_age members.
+       (m4__quote_age, m4__safe_quotes): New accessor macros, no need for
+       functions at this point.
+       (struct m4_symbol_value, struct m4_macro_args): Add quote_age
+       member.
+       (m4_set_symbol_value_text): Adjust fast accessor.
+       (m4_get_symbol_value_quote_age): New fast accessor.
+       * m4/symtab.c (m4_set_symbol_value_text): Add parameter.
+       (m4_get_symbol_value_quote_age): New function.
+       (m4_symbol_value_copy): Adjust callers.
+       * m4/macro.c (expand_token): Add parameter, and track quote age.
+       (expand_argument, collect_arguments): Track quote age.
+       (m4_macro_expand_input, process_macro, m4_make_argv_ref)
+       (m4_macro_expand_input): Update callers.
+       (m4_arg_text, m4_arg_len, m4_arg_func): Abort on type mismatch.
+       * m4/input.c: Comment cleanups.
+       (struct m4_input_block): Reduce size.
+       (m4__next_token): Report quote age.
+       (m4_push_builtin, init_builtin_token): Update callers.
+       * m4/utility.c (skip_space): Adjust callers.
+       * m4/module.c (install_macro_table): Likewise.
+       * m4/syntax.c (m4_set_syntax): Initialize and update quote age.
+       (m4_set_quotes, m4_set_comment): Detect no-op changes, and update
+       quote age.
+       (set_quote_age): New helper function.
+       (check_is_single_quotes, check_is_single_comments): Adjust
+       callers.
+       * src/freeze.c (reload_frozen_state): Likewise.
+       * src/main.c (main): Likewise.
+       * modules/m4.c (define, pushdef): No need to set macro text.
+       * tests/builtins.at (changequote, defn): New tests.
+       * examples/wrapfifo.m4: New file.
+       * examples/wraplifo.m4: New file.
+       * Makefile.am (dist_pkgdata_DATA): Distribute new examples.
+
 2007-12-04  Eric Blake  <address@hidden>
 
        Fix builds with OpenBSD make.
diff --git a/examples/wrapfifo.m4 b/examples/wrapfifo.m4
new file mode 100644
index 0000000..95ff87a
--- /dev/null
+++ b/examples/wrapfifo.m4
@@ -0,0 +1,10 @@
+dnl Redefine m4wrap to have FIFO semantics.
+define(`_m4wrap_level', `0')dnl
+define(`m4wrap',
+`ifdef(`m4wrap'_m4wrap_level,
+       `define(`m4wrap'_m4wrap_level,
+               defn(`m4wrap'_m4wrap_level)`$1')',
+       `builtin(`m4wrap', `define(`_m4wrap_level',
+                                  incr(_m4wrap_level))dnl
+m4wrap'_m4wrap_level)dnl
+define(`m4wrap'_m4wrap_level, `$1')')')dnl
diff --git a/examples/wraplifo.m4 b/examples/wraplifo.m4
new file mode 100644
index 0000000..bdbf3fb
--- /dev/null
+++ b/examples/wraplifo.m4
@@ -0,0 +1,10 @@
+dnl Redefine m4wrap to have LIFO semantics.
+define(`_m4wrap_level', `0')dnl
+define(`_m4wrap', defn(`m4wrap'))dnl
+define(`m4wrap',
+`ifdef(`m4wrap'_m4wrap_level,
+       `define(`m4wrap'_m4wrap_level,
+               `$1'defn(`m4wrap'_m4wrap_level))',
+       `_m4wrap(`define(`_m4wrap_level', incr(_m4wrap_level))dnl
+m4wrap'_m4wrap_level)dnl
+define(`m4wrap'_m4wrap_level, `$1')')')dnl
diff --git a/m4/input.c b/m4/input.c
index 08c5f64..e4228a9 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -28,13 +28,13 @@
 /*#define DEBUG_INPUT */
 
 /*
-   Unread input can be either files that should be read (eg. included
-   files), strings which should be rescanned (eg. macro expansion
-   text), or quoted builtin definitions (as returned by the builtin
-   "defn").  Unread input is organized in a stack, implemented with an
-   obstack.  Each input source is described by a "struct
-   m4_input_block".  The obstack is "input_stack".  The top of the
-   input stack is "isp".
+   Unread input can be either files that should be read (from the
+   command line or by include/sinclude), strings which should be
+   rescanned (normal macro expansion text), or quoted builtin
+   definitions (as returned by the builtin "defn").  Unread input is
+   organized in a stack, implemented with an obstack.  Each input
+   source is described by a "struct m4_input_block".  The obstack is
+   "input_stack".  The top of the input stack is "isp".
 
    Each input_block has an associated struct input_funcs, which is a
    vtable that defines polymorphic functions for peeking, reading,
@@ -72,12 +72,12 @@
    manage the coordination between the different push routines.
 
    Normally, input sources behave in LIFO order, resembling a stack.
-   But thanks to the defn macro, when collecting the expansion of a
-   macro, it is possible that we must intermix multiple input blocks
-   in FIFO order.  This also applies to the POSIX requirements of
-   m4wrap.  Therefore, when collecting an expansion, a meta-input
-   block is formed which will visit its children in FIFO order,
-   without losing data when the obstack is cleared in LIFO order.
+   But thanks to the defn and m4wrap macros, when collecting the
+   expansion of a macro, it is possible that we must intermix multiple
+   input blocks in FIFO order.  Therefore, when collecting an
+   expansion, a meta-input block is formed which will visit its
+   children in FIFO order, without losing data when the obstack is
+   cleared in LIFO order.
 
    The current file and line number are stored in the context, for use
    by the error handling functions in utility.c.  When collecting a
@@ -117,16 +117,20 @@ static    bool    consume_syntax          (m4 *, 
m4_obstack *, unsigned int);
 static int m4_print_token (const char *, m4__token_type, m4_symbol_value *);
 #endif
 
+/* Vtable of callbacks for each input method.  */
 struct input_funcs
 {
-  /* Peek at input, return CHAR_RETRY if none available.  */
+  /* Peek at input, return an unsigned char, CHAR_BUILTIN if it is a
+     builtin, or CHAR_RETRY if none available.  */
   int  (*peek_func)    (m4_input_block *);
 
-  /* Read input, return CHAR_RETRY if none available.  If the flag is
-     false, then do not alter the current file or line.  */
+  /* Read input, return an unsigned char, CHAR_BUILTIN if it is a
+     builtin, or CHAR_RETRY if none available.  If the flag is false,
+     then do not alter the current file or line.  */
   int  (*read_func)    (m4_input_block *, m4 *, bool);
 
-  /* Unread a single character, previously read by read_func.  */
+  /* Unread a single unsigned character or CHAR_BUILTIN, must be the
+     same character previously read by read_func.  */
   void (*unget_func)   (m4_input_block *, int);
 
   /* Optional function to perform cleanup at end of input.  */
@@ -137,46 +141,45 @@ struct input_funcs
   void (*print_func)   (m4_input_block *, m4 *, m4_obstack *);
 };
 
+/* A block of input to be scanned.  */
 struct m4_input_block
 {
-  m4_input_block *prev;                /* previous input_block on the input 
stack */
-  struct input_funcs *funcs;   /* functions on this input_block */
-  const char *file;            /* file where this input is from */
-  int line;                    /* line where this input is from */
+  m4_input_block *prev;                /* Previous input_block on the input 
stack.  */
+  struct input_funcs *funcs;   /* Virtual functions of this input_block.  */
+  const char *file;            /* File where this input is from.  */
+  int line;                    /* Line where this input is from.  */
 
   union
     {
       struct
        {
-         char *str;            /* string value */
-         size_t len;           /* remaining length */
+         char *str;            /* String value.  */
+         size_t len;           /* Remaining length.  */
        }
-      u_s;
+      u_s;     /* See string_funcs.  */
       struct
        {
-         FILE *fp;             /* input file handle */
-         bool end;             /* true iff peek returned EOF */
-         bool close;           /* true if file should be closed on EOF */
-         bool advance_line;    /* start_of_input_line from next_char () */
+         FILE *fp;                     /* Input file handle.  */
+         bool_bitfield end : 1;        /* True iff peek returned EOF.  */
+         bool_bitfield close : 1;      /* True to close file on pop.  */
+         bool_bitfield line_start : 1; /* Saved start_of_input_line state.  */
        }
-      u_f;
+      u_f;     /* See file_funcs.  */
       struct
        {
-         const m4_builtin *builtin;  /* pointer to builtin's function. */
-         m4_module *module;      /* originating module. */
-         int flags;              /* flags associated with the builtin. */
-         m4_hash *arg_signature; /* argument signature for builtin.  */
-         unsigned int min_args;  /* argv minima for the builtin. */
-         unsigned int max_args;  /* argv maxima for the builtin. */
-         bool read;              /* true iff block has been read. */
+         const m4_builtin *builtin;    /* Pointer to builtin's function.  */
+         m4_module *module;            /* Originating module.  */
+         bool_bitfield read : 1;       /* True iff block has been read.  */
+         int flags : 24;               /* Flags tied to the builtin. */
+         m4_hash *arg_signature;       /* Argument signature for builtin.  */
        }
-      u_b;
+      u_b;     /* See builtin_funcs.  */
       struct
        {
-         m4_input_block *current; /* pointer to current sub-block. */
-         m4_input_block *tail;    /* pointer to last sub-block. */
+         m4_input_block *current;      /* Pointer to current sub-block.  */
+         m4_input_block *tail;         /* Pointer to last sub-block.  */
        }
-      u_c;
+      u_c;     /* See composite_funcs.  */
     }
   u;
 };
@@ -199,7 +202,7 @@ static m4_obstack *wrapup_stack;
 static m4_obstack *current_input;
 
 /* Bottom of token_stack, for obstack_free.  */
-static char *token_bottom;
+static void *token_bottom;
 
 /* Pointer to top of current_input.  */
 static m4_input_block *isp;
@@ -295,7 +298,7 @@ file_clean (m4_input_block *me, m4 *context)
     }
   else if (me->u.u_f.close && fclose (me->u.u_f.fp) == EOF)
     m4_error (context, 0, errno, NULL, _("error reading file `%s'"), me->file);
-  start_of_input_line = me->u.u_f.advance_line;
+  start_of_input_line = me->u.u_f.line_start;
   m4_set_output_line (context, -1);
 }
 
@@ -332,8 +335,7 @@ m4_push_file (m4 *context, FILE *fp, const char *title, 
bool close_file)
   m4_debug_message (context, M4_DEBUG_TRACE_INPUT,
                    _("input read from %s"), title);
 
-  i = (m4_input_block *) obstack_alloc (current_input,
-                                       sizeof (m4_input_block));
+  i = (m4_input_block *) obstack_alloc (current_input, sizeof *i);
   i->funcs = &file_funcs;
   /* Save title on a separate obstack, so that wrapped text can refer
      to it even after the file is popped.  */
@@ -343,7 +345,7 @@ m4_push_file (m4 *context, FILE *fp, const char *title, 
bool close_file)
   i->u.u_f.fp = fp;
   i->u.u_f.end = false;
   i->u.u_f.close = close_file;
-  i->u.u_f.advance_line = start_of_input_line;
+  i->u.u_f.line_start = start_of_input_line;
 
   m4_set_output_line (context, -1);
 
@@ -421,8 +423,7 @@ m4_push_builtin (m4 *context, m4_symbol_value *token)
       next = NULL;
     }
 
-  i = (m4_input_block *) obstack_alloc (current_input,
-                                       sizeof (m4_input_block));
+  i = (m4_input_block *) obstack_alloc (current_input, sizeof *i);
   i->funcs = &builtin_funcs;
   i->file = m4_get_current_file (context);
   i->line = m4_get_current_line (context);
@@ -430,9 +431,11 @@ m4_push_builtin (m4 *context, m4_symbol_value *token)
   i->u.u_b.builtin     = m4_get_symbol_value_builtin (token);
   i->u.u_b.module      = VALUE_MODULE (token);
   i->u.u_b.arg_signature = VALUE_ARG_SIGNATURE (token);
-  i->u.u_b.min_args    = VALUE_MIN_ARGS (token);
-  i->u.u_b.max_args    = VALUE_MAX_ARGS (token);
   i->u.u_b.flags       = VALUE_FLAGS (token);
+  /* Check for bitfield truncation.  */
+  assert (i->u.u_b.flags == VALUE_FLAGS (token)
+         && i->u.u_b.builtin->min_args == VALUE_MIN_ARGS (token)
+         && i->u.u_b.builtin->max_args == VALUE_MAX_ARGS (token));
   i->u.u_b.read                = false;
 
   i->prev = isp;
@@ -501,8 +504,7 @@ m4_push_string_init (m4 *context)
   while (isp && pop_input (context, false));
 
   /* Reserve the next location on the obstack.  */
-  next = (m4_input_block *) obstack_alloc (current_input,
-                                          sizeof (m4_input_block));
+  next = (m4_input_block *) obstack_alloc (current_input, sizeof *next);
   next->funcs = &string_funcs;
   next->file = m4_get_current_file (context);
   next->line = m4_get_current_line (context);
@@ -661,8 +663,7 @@ m4_push_wrapup (m4 *context, const char *s)
 {
   m4_input_block *i;
 
-  i = (m4_input_block *) obstack_alloc (wrapup_stack,
-                                       sizeof (m4_input_block));
+  i = (m4_input_block *) obstack_alloc (wrapup_stack, sizeof *i);
   i->prev = wsp;
 
   i->funcs = &string_funcs;
@@ -708,8 +709,8 @@ pop_input (m4 *context, bool cleanup)
 }
 
 /* To switch input over to the wrapup stack, main () calls pop_wrapup.
-   Since wrapup text can install new wrapup text, pop_wrapup () returns
-   false when there is no wrapup text on the stack, and true otherwise.  */
+   Since wrapup text can install new wrapup text, pop_wrapup ()
+   returns true if there is more wrapped text to parse.  */
 bool
 m4_pop_wrapup (m4 *context)
 {
@@ -736,7 +737,7 @@ m4_pop_wrapup (m4 *context)
                    (unsigned long int) ++level);
 
   current_input = wrapup_stack;
-  wrapup_stack = (m4_obstack *) xmalloc (sizeof (m4_obstack));
+  wrapup_stack = (m4_obstack *) xmalloc (sizeof *wrapup_stack);
   obstack_init (wrapup_stack);
 
   isp = wsp;
@@ -760,8 +761,8 @@ init_builtin_token (m4 *context, m4_symbol_value *token)
   VALUE_MODULE (token)         = block->u.u_b.module;
   VALUE_FLAGS (token)          = block->u.u_b.flags;
   VALUE_ARG_SIGNATURE (token)  = block->u.u_b.arg_signature;
-  VALUE_MIN_ARGS (token)       = block->u.u_b.min_args;
-  VALUE_MAX_ARGS (token)       = block->u.u_b.max_args;
+  VALUE_MIN_ARGS (token)       = block->u.u_b.builtin->min_args;
+  VALUE_MAX_ARGS (token)       = block->u.u_b.builtin->max_args;
 }
 
 
@@ -963,7 +964,7 @@ consume_syntax (m4 *context, m4_obstack *obs, unsigned int 
syntax)
 }
 
 
-/* Inititialize input stacks, and quote/comment characters.  */
+/* Inititialize input stacks.  */
 void
 m4_input_init (m4 *context)
 {
@@ -971,16 +972,15 @@ m4_input_init (m4 *context)
   m4_set_current_file (context, NULL);
   m4_set_current_line (context, 0);
 
-  current_input = (m4_obstack *) xmalloc (sizeof (m4_obstack));
+  current_input = (m4_obstack *) xmalloc (sizeof *current_input);
   obstack_init (current_input);
-  wrapup_stack = (m4_obstack *) xmalloc (sizeof (m4_obstack));
+  wrapup_stack = (m4_obstack *) xmalloc (sizeof *wrapup_stack);
   obstack_init (wrapup_stack);
 
   /* Allocate an object in the current chunk, so that obstack_free
      will always work even if the first token parsed spills to a new
      chunk.  */
   obstack_init (&token_stack);
-  obstack_alloc (&token_stack, 1);
   token_bottom = obstack_finish (&token_stack);
 
   isp = NULL;
@@ -990,6 +990,7 @@ m4_input_init (m4 *context)
   start_of_input_line = false;
 }
 
+/* Free memory used by the input engine.  */
 void
 m4_input_exit (void)
 {
@@ -1000,20 +1001,17 @@ m4_input_exit (void)
 }
 
 
-/* Parse and return a single token from the input stream.  A token can
-   be M4_TOKEN_EOF, if the input_stack is empty; it can be
-   M4_TOKEN_STRING for a quoted string; M4_TOKEN_WORD for something
-   that is a potential macro name; and M4_TOKEN_SIMPLE for any single
-   character that is not a part of any of the previous types.  If LINE
-   is not NULL, set *LINE to the line number where the token starts.
-   Report errors (unterminated comments or strings) on behalf of
-   CALLER, if non-NULL.
-
-   M4__next_token () returns the token type, and passes back a pointer to
-   the token data through TOKEN.  The token text is collected on the obstack
-   token_stack, which never contains more than one token text at a time.
-   The storage pointed to by the fields in TOKEN is therefore subject to
-   change the next time m4__next_token () is called.  */
+/* Parse and return a single token from the input stream, built in
+   TOKEN.  See m4__token_type for the valid return types, along with a
+   description of what TOKEN will contain.  If LINE is not NULL, set
+   *LINE to the line number where the token starts.  Report errors
+   (unterminated comments or strings) on behalf of CALLER, if
+   non-NULL.
+
+   The token text is collected on the obstack token_stack, which never
+   contains more than one token text at a time.  The storage pointed
+   to by the fields in TOKEN is therefore subject to change the next
+   time m4__next_token () is called.  */
 m4__token_type
 m4__next_token (m4 *context, m4_symbol_value *token, int *line,
                const char *caller)
@@ -1028,9 +1026,11 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
   assert (next == NULL);
   if (!line)
     line = &dummy;
+  memset (token, '\0', sizeof *token);
   do {
     obstack_free (&token_stack, token_bottom);
 
+
     /* Must consume an input character, but not until CHAR_BUILTIN is
        handled.  */
     ch = peek_char (context);
@@ -1229,9 +1229,8 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
   len = obstack_object_size (&token_stack);
   obstack_1grow (&token_stack, '\0');
 
-  memset (token, '\0', sizeof (m4_symbol_value));
-
-  m4_set_symbol_value_text (token, obstack_finish (&token_stack), len);
+  m4_set_symbol_value_text (token, obstack_finish (&token_stack), len,
+                           m4__quote_age (M4SYNTAX));
   VALUE_MAX_ARGS (token)       = -1;
 
 #ifdef DEBUG_INPUT
diff --git a/m4/m4module.h b/m4/m4module.h
index 8f3f590..f70d414 100644
--- a/m4/m4module.h
+++ b/m4/m4module.h
@@ -267,13 +267,18 @@ extern bool               m4_is_symbol_value_text   
(m4_symbol_value *);
 extern bool            m4_is_symbol_value_func   (m4_symbol_value *);
 extern bool            m4_is_symbol_value_placeholder  (m4_symbol_value *);
 extern bool            m4_is_symbol_value_void   (m4_symbol_value *);
+
 extern const char *    m4_get_symbol_value_text  (m4_symbol_value *);
 extern size_t          m4_get_symbol_value_len   (m4_symbol_value *);
+extern unsigned int    m4_get_symbol_value_quote_age   (m4_symbol_value *);
+
 extern m4_builtin_func *m4_get_symbol_value_func  (m4_symbol_value *);
 extern const m4_builtin *m4_get_symbol_value_builtin   (m4_symbol_value *);
 extern const char *    m4_get_symbol_value_placeholder (m4_symbol_value *);
+
 extern void            m4_set_symbol_value_text  (m4_symbol_value *,
-                                                  const char *, size_t);
+                                                  const char *, size_t,
+                                                  unsigned int);
 extern void            m4_set_symbol_value_builtin     (m4_symbol_value *,
                                                         const m4_builtin *);
 extern void            m4_set_symbol_value_placeholder (m4_symbol_value *,
@@ -301,12 +306,12 @@ extern bool       m4_is_arg_text          (m4_macro_args 
*, unsigned int);
 extern bool    m4_is_arg_func          (m4_macro_args *, unsigned int);
 extern const char *m4_arg_text         (m4_macro_args *, unsigned int);
 extern bool    m4_arg_equal            (m4_macro_args *, unsigned int,
-                                         unsigned int);
+                                        unsigned int);
 extern bool    m4_arg_empty            (m4_macro_args *, unsigned int);
 extern size_t  m4_arg_len              (m4_macro_args *, unsigned int);
 extern m4_builtin_func *m4_arg_func    (m4_macro_args *, unsigned int);
 extern m4_macro_args *m4_make_argv_ref (m4_macro_args *, const char *, size_t,
-                                         bool, bool);
+                                        bool, bool);
 
 
 /* --- RUNTIME DEBUGGING --- */
@@ -410,7 +415,12 @@ enum {
 #define M4_SYNTAX_VALUE                (~(M4_SYNTAX_RQUOTE | M4_SYNTAX_ECOMM))
 
 #define m4_syntab(S, C)                ((S)->table[(C)])
-#define m4_has_syntax(S, C, T) ((m4_syntab ((S), (C)) & (T)) > 0)
+/* Determine if character C matches any of the bitwise-or'd syntax
+   categories T for the given syntax table S.  C can be either an
+   unsigned int (including special values such as CHAR_BUILTIN) or a
+   char which will be interpreted as an unsigned char.  */
+#define m4_has_syntax(S, C, T)                                         \
+  ((m4_syntab ((S), sizeof (C) == 1 ? to_uchar (C) : (C)) & (T)) > 0)
 
 extern void    m4_set_quotes   (m4_syntax_table*, const char*, const char*);
 extern void    m4_set_comment  (m4_syntax_table*, const char*, const char*);
diff --git a/m4/m4private.h b/m4/m4private.h
index 8e23e00..db1f513 100644
--- a/m4/m4private.h
+++ b/m4/m4private.h
@@ -220,6 +220,9 @@ struct m4_symbol_value
     {
       size_t           len;    /* Length of string.  */
       const char *     text;   /* String contents.  */
+      /* Quote age when this string was built, or zeroto force a
+        rescan of the string.  Ignored for 0 len.  */
+      unsigned int     quote_age;
     } u_t;                     /* Valid when type is TEXT, PLACEHOLDER.  */
     const m4_builtin * builtin;/* Valid when type is FUNC.  */
     m4_symbol_chain *  chain;  /* Valid when type is COMP.  */
@@ -243,6 +246,10 @@ struct m4_macro_args
   bool_bitfield has_ref : 1;
   const char *argv0; /* The macro name being expanded.  */
   size_t argv0_len; /* Length of argv0.  */
+  /* The value of quote_age for all tokens, or 0 if quote_age changed
+     during parsing or any token is potentially unsafe and requires a
+     rescan.  */
+  unsigned int quote_age;
   size_t arraylen; /* True length of allocated elements in array.  */
   /* Used as a variable-length array, storing information about each
      argument.  */
@@ -282,6 +289,7 @@ struct m4_macro_args
                                        ((V)->type == M4_SYMBOL_PLACEHOLDER)
 #  define m4_get_symbol_value_text(V)  ((V)->u.u_t.text)
 #  define m4_get_symbol_value_len(V)   ((V)->u.u_t.len)
+#  define m4_get_symbol_value_quote_age(V)     ((V)->u.u_t.quote_age)
 #  define m4_get_symbol_value_func(V)  ((V)->u.builtin->func)
 #  define m4_get_symbol_value_builtin(V) ((V)->u.builtin)
 #  define m4_get_symbol_value_placeholder(V)                           \
@@ -289,8 +297,9 @@ struct m4_macro_args
 #  define m4_symbol_value_groks_macro(V) (BIT_TEST ((V)->flags,                
\
                                                    VALUE_MACRO_ARGS_BIT))
 
-#  define m4_set_symbol_value_text(V, T, L)                            \
-  ((V)->type = M4_SYMBOL_TEXT, (V)->u.u_t.text = (T), (V)->u.u_t.len = (L))
+#  define m4_set_symbol_value_text(V, T, L, A)                         \
+  ((V)->type = M4_SYMBOL_TEXT, (V)->u.u_t.text = (T),                   \
+   (V)->u.u_t.len = (L), (V)->u.u_t.quote_age = (A))
 #  define m4_set_symbol_value_builtin(V, B)                            \
   ((V)->type = M4_SYMBOL_FUNC, (V)->u.builtin = (B))
 #  define m4_set_symbol_value_placeholder(V, T)                                
\
@@ -335,14 +344,14 @@ extern void m4__symtab_remove_module_references 
(m4_symbol_table*,
 
 /* CHAR_RETRY must be last, because we size the syntax table to hold
    all other characters and sentinels. */
-#define CHAR_EOF       256     /* character return on EOF */
-#define CHAR_BUILTIN   257     /* character return for BUILTIN token */
-#define CHAR_RETRY     258     /* character return for end of input block */
+#define CHAR_EOF       256     /* Character return on EOF.  */
+#define CHAR_BUILTIN   257     /* Character return for BUILTIN token.  */
+#define CHAR_RETRY     258     /* Character return for end of input block.  */
 
-#define DEF_LQUOTE "`"
-#define DEF_RQUOTE "\'"
-#define DEF_BCOMM "#"
-#define DEF_ECOMM "\n"
+#define DEF_LQUOTE     "`"     /* Default left quote delimiter.  */
+#define DEF_RQUOTE     "\'"    /* Default right quote delimiter.  */
+#define DEF_BCOMM      "#"     /* Default begin comment delimiter.  */
+#define DEF_ECOMM      "\n"    /* Default end comment delimiter.  */
 
 typedef struct {
   char *string;                /* characters of the string */
@@ -362,14 +371,26 @@ struct m4_syntax_table {
 
   /* True iff strlen(lquote) == strlen(rquote) == 1 and lquote is not
      interfering with macro names.  */
-  bool is_single_quotes;
+  bool_bitfield is_single_quotes : 1;
 
   /* True iff strlen(bcomm) == strlen(ecomm) == 1 and bcomm is not
      interfering with macros or quotes.  */
-  bool is_single_comments;
+  bool_bitfield is_single_comments : 1;
 
   /* True iff some character has M4_SYNTAX_ESCAPE.  */
-  bool is_macro_escaped;
+  bool_bitfield is_macro_escaped : 1;
+
+  /* Track the number of changesyntax calls.  This saturates at
+     0xffff, so the idea is that most users won't be changing the
+     syntax that frequently; perhaps in the future we will cache
+     frequently used syntax schemes by index.  */
+  unsigned short syntax_age;
+
+  /* Track the current quote age, determined by all significant
+     changequote, changecom, and changesyntax calls, since any of
+     these can alter the rescan of a prior parameter in a quoted
+     context.  */
+  unsigned int quote_age;
 };
 
 /* Fast macro versions of syntax table accessor functions,
@@ -385,6 +406,14 @@ struct m4_syntax_table {
 #  define m4_is_syntax_macro_escaped(S)                ((S)->is_macro_escaped)
 #endif
 
+/* Return the current quote age.  */
+#define m4__quote_age(S)               ((S)->quote_age)
+
+/* Return true if the current quote age guarantees that parsing the
+   current token in the context of a quoted string of the same quote
+   age will give the same parse.  */
+#define m4__safe_quotes(S)             (((S)->quote_age & 0xffff) != 0)
+
 
 /* --- MACRO MANAGEMENT --- */
 
diff --git a/m4/macro.c b/m4/macro.c
index 25fc7e7..56d43ac 100644
--- a/m4/macro.c
+++ b/m4/macro.c
@@ -32,8 +32,8 @@
 static m4_macro_args *collect_arguments (m4 *, const char *, size_t,
                                         m4_symbol *, m4_obstack *);
 static void    expand_macro      (m4 *, const char *, size_t, m4_symbol *);
-static void    expand_token      (m4 *, m4_obstack *, m4__token_type,
-                                 m4_symbol_value *, int);
+static bool    expand_token      (m4 *, m4_obstack *, m4__token_type,
+                                 m4_symbol_value *, int, bool);
 static bool    expand_argument   (m4 *, m4_obstack *, m4_symbol_value *,
                                  const char *);
 static void    process_macro    (m4 *, m4_symbol_value *, m4_obstack *, int,
@@ -85,26 +85,35 @@ m4_macro_expand_input (m4 *context)
   obstack_init (&argc_stack);
   obstack_init (&argv_stack);
 
-  m4_set_symbol_value_text (&empty_symbol, "", 0);
+  m4_set_symbol_value_text (&empty_symbol, "", 0, 0);
+  VALUE_MAX_ARGS (&empty_symbol) = -1;
 
   while ((type = m4__next_token (context, &token, &line, NULL))
         != M4_TOKEN_EOF)
-    expand_token (context, (m4_obstack *) NULL, type, &token, line);
+    expand_token (context, (m4_obstack *) NULL, type, &token, line, true);
 
   obstack_free (&argc_stack, NULL);
   obstack_free (&argv_stack, NULL);
 }
 
 
-/* Expand one token, according to its type.  Potential macro names
-   (M4_TOKEN_WORD) are looked up in the symbol table, to see if they have a
-   macro definition.  If they have, they are expanded as macros, otherwise
-   the text are just copied to the output.  */
-static void
-expand_token (m4 *context, m4_obstack *obs,
-             m4__token_type type, m4_symbol_value *token, int line)
+/* Expand one token onto OBS, according to its type.  If OBS is NULL,
+   output the expansion to the current diversion.  TYPE determines the
+   contents of TOKEN.  Potential macro names (a TYPE of M4_TOKEN_WORD)
+   are looked up in the symbol table, to see if they have a macro
+   definition.  If they have, they are expanded as macros, otherwise
+   the text are just copied to the output.  LINE determines where
+   TOKEN began.  FIRST is true if there is no prior content in the
+   current macro argument.  Return true if the result is guranteed to
+   give the same parse on rescan in a quoted context with the same
+   quote age.  Returning false is always safe, although it may lead to
+   slower performance.  */
+static bool
+expand_token (m4 *context, m4_obstack *obs, m4__token_type type,
+             m4_symbol_value *token, int line, bool first)
 {
   m4_symbol *symbol;
+  bool result;
   const char *text = (m4_is_symbol_value_text (token)
                      ? m4_get_symbol_value_text (token) : NULL);
 
@@ -112,16 +121,31 @@ expand_token (m4 *context, m4_obstack *obs,
     {                          /* TOKSW */
     case M4_TOKEN_EOF:
     case M4_TOKEN_MACDEF:
+      /* Always safe, since there is no text to rescan.  */
+      return true;
+
+    case M4_TOKEN_STRING:
+      /* Tokens and comments are safe in isolation (since quote_age
+        detects any change in delimiters).  This is also returned for
+        sequences of benign characters, such as digits.  But if other
+        text is already present, multi-character delimiters could be
+        formed by concatenation, so use a conservative heuristic.  */
+      result = first || m4__safe_quotes (M4SYNTAX);
       break;
 
     case M4_TOKEN_OPEN:
     case M4_TOKEN_COMMA:
     case M4_TOKEN_CLOSE:
-    case M4_TOKEN_SIMPLE:
-    case M4_TOKEN_STRING:
     case M4_TOKEN_SPACE:
-      m4_shipout_text (context, obs, text, m4_get_symbol_value_len (token),
-                      line);
+      /* Conservative heuristic, thanks to multi-character delimiter
+        concatenation.  */
+      result = m4__safe_quotes (M4SYNTAX);
+      break;
+
+    case M4_TOKEN_SIMPLE:
+      /* No guarantees here.  */
+      assert (m4_get_symbol_value_len (token) == 1);
+      result = false;
       break;
 
     case M4_TOKEN_WORD:
@@ -130,7 +154,7 @@ expand_token (m4 *context, m4_obstack *obs,
        size_t len = m4_get_symbol_value_len (token);
        size_t len2 = len;
 
-       if (m4_has_syntax (M4SYNTAX, to_uchar (*textp), M4_SYNTAX_ESCAPE))
+       if (m4_has_syntax (M4SYNTAX, *textp, M4_SYNTAX_ESCAPE))
          {
            textp++;
            len2--;
@@ -142,16 +166,25 @@ expand_token (m4 *context, m4_obstack *obs,
            || (symbol->value->type == M4_SYMBOL_FUNC
                && BIT_TEST (SYMBOL_FLAGS (symbol), VALUE_BLIND_ARGS_BIT)
                && !m4__next_token_is_open (context)))
-         m4_shipout_text (context, obs, text, len, line);
-       else
-         expand_macro (context, textp, len2, symbol);
+         {
+           m4_shipout_text (context, obs, text, len, line);
+           /* The word just output is unquoted, but we can trust the
+              heuristics of safe_quote.  */
+           return m4__safe_quotes (M4SYNTAX);
+         }
+       expand_macro (context, textp, len2, symbol);
+       /* Expanding a macro may create new tokens to scan, and those
+          tokens may generate unsafe text, but we did not append any
+          text now.  */
+       return true;
       }
-      break;
 
     default:
       assert (!"INTERNAL ERROR: bad token type in expand_token ()");
       abort ();
     }
+  m4_shipout_text (context, obs, text, m4_get_symbol_value_len (token), line);
+  return result;
 }
 
 
@@ -173,6 +206,8 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
   const char *file = m4_get_current_file (context);
   int line = m4_get_current_line (context);
   size_t len;
+  unsigned int age = m4__quote_age (M4SYNTAX);
+  bool first = true;
 
   argp->type = M4_SYMBOL_VOID;
 
@@ -201,7 +236,7 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
              len = obstack_object_size (obs);
              obstack_1grow (obs, '\0');
              VALUE_MODULE (argp) = NULL;
-             m4_set_symbol_value_text (argp, obstack_finish (obs), len);
+             m4_set_symbol_value_text (argp, obstack_finish (obs), len, age);
              return type == M4_TOKEN_COMMA;
            }
          /* fallthru */
@@ -211,7 +246,8 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
            paren_level++;
          else if (type == M4_TOKEN_CLOSE)
            paren_level--;
-         expand_token (context, obs, type, &token, line);
+         if (!expand_token (context, obs, type, &token, line, first))
+           age = 0;
          break;
 
        case M4_TOKEN_EOF:
@@ -222,7 +258,8 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
        case M4_TOKEN_WORD:
        case M4_TOKEN_SPACE:
        case M4_TOKEN_STRING:
-         expand_token (context, obs, type, &token, line);
+         if (!expand_token (context, obs, type, &token, line, first))
+           age = 0;
          break;
 
        case M4_TOKEN_MACDEF:
@@ -233,10 +270,12 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
          break;
 
        default:
-         assert (!"INTERNAL ERROR: bad token type in expand_argument ()");
+         assert (!"expand_argument");
          abort ();
        }
 
+      if (argp->type != M4_SYMBOL_VOID || obstack_object_size (obs))
+       first = false;
       type = m4__next_token (context, &token, NULL, caller);
     }
 }
@@ -370,6 +409,7 @@ collect_arguments (m4 *context, const char *name, size_t 
len,
      table, so we don't have to copy it here.  */
   args.argv0 = (char *) obstack_copy0 (arguments, name, len);
   args.argv0_len = len;
+  args.quote_age = m4__quote_age (M4SYNTAX);
   args.arraylen = 0;
   obstack_grow (&argv_stack, &args, offsetof (m4_macro_args, array));
   name = args.argv0;
@@ -391,11 +431,20 @@ collect_arguments (m4 *context, const char *name, size_t 
len,
          obstack_ptr_grow (&argv_stack, tokenp);
          args.arraylen++;
          args.argc++;
+         /* Be conservative - any change in quoting while collecting
+            arguments, or any unsafe argument, will require a rescan
+            if $@ is reused.  */
+         if (m4_is_symbol_value_text (tokenp)
+             && m4_get_symbol_value_len (tokenp)
+             && m4_get_symbol_value_quote_age (tokenp) != args.quote_age)
+           args.quote_age = 0;
        }
       while (more_args);
     }
   argv = (m4_macro_args *) obstack_finish (&argv_stack);
   argv->argc = args.argc;
+  if (args.quote_age != m4__quote_age (M4SYNTAX))
+    argv->quote_age = 0;
   argv->arraylen = args.arraylen;
   return argv;
 }
@@ -426,7 +475,7 @@ m4_macro_call (m4 *context, m4_symbol_value *value, 
m4_obstack *expansion,
             m4_get_symbol_value_placeholder (value));
   else
     {
-      assert (!"INTERNAL ERROR: bad symbol type in m4_macro_call ()");
+      assert (!"m4_macro_call");
       abort ();
     }
 }
@@ -447,7 +496,7 @@ process_macro (m4 *context, m4_symbol_value *value, 
m4_obstack *obs,
     {
       char ch;
 
-      if (!m4_has_syntax (M4SYNTAX, to_uchar (*text), M4_SYNTAX_DOLLAR))
+      if (!m4_has_syntax (M4SYNTAX, *text, M4_SYNTAX_DOLLAR))
        {
          obstack_1grow (obs, *text);
          text++;
@@ -501,7 +550,7 @@ process_macro (m4 *context, m4_symbol_value *value, 
m4_obstack *obs,
              const char *key;
 
              for (endp = ++text;
-                  *endp && m4_has_syntax (M4SYNTAX, to_uchar (*endp),
+                  *endp && m4_has_syntax (M4SYNTAX, *endp,
                                           (M4_SYNTAX_OTHER | M4_SYNTAX_ALPHA
                                            | M4_SYNTAX_NUM));
                   ++endp)
@@ -768,7 +817,7 @@ m4_is_arg_func (m4_macro_args *argv, unsigned int index)
   return m4_is_symbol_value_func (m4_arg_symbol (argv, index));
 }
 
-/* Given ARGV, return the text at argument INDEX, or NULL if the
+/* Given ARGV, return the text at argument INDEX.  Abort if the
    argument is not text.  Index 0 is always text, and indices beyond
    argc return the empty string.  */
 const char *
@@ -781,8 +830,6 @@ m4_arg_text (m4_macro_args *argv, unsigned int index)
   if (argv->argc <= index)
     return "";
   value = m4_arg_symbol (argv, index);
-  if (!m4_is_symbol_value_text (value))
-    return NULL;
   return m4_get_symbol_value_text (value);
 }
 
@@ -815,7 +862,7 @@ m4_arg_empty (m4_macro_args *argv, unsigned int index)
          : !argv->argv0_len);
 }
 
-/* Given ARGV, return the length of argument INDEX, or SIZE_MAX if the
+/* Given ARGV, return the length of argument INDEX.  Abort if the
    argument is not text.  Indices beyond argc return 0.  */
 size_t
 m4_arg_len (m4_macro_args *argv, unsigned int index)
@@ -827,25 +874,15 @@ m4_arg_len (m4_macro_args *argv, unsigned int index)
   if (argv->argc <= index)
     return 0;
   value = m4_arg_symbol (argv, index);
-  if (!m4_is_symbol_value_text (value))
-    return SIZE_MAX;
   return m4_get_symbol_value_len (value);
 }
 
 /* Given ARGV, return the builtin function referenced by argument
-   INDEX, or NULL if it is not a builtin.  Index 0, and indices beyond
-   argc, return NULL.  */
+   INDEX.  Abort if it is not a single builtin.  */
 m4_builtin_func *
 m4_arg_func (m4_macro_args *argv, unsigned int index)
 {
-  m4_symbol_value *value;
-
-  if (index == 0 || argv->argc <= index)
-    return NULL;
-  value = m4_arg_symbol (argv, index);
-  if (!m4_is_symbol_value_func (value))
-    return NULL;
-  return m4_get_symbol_value_func (value);
+  return m4_get_symbol_value_func (m4_arg_symbol (argv, index));
 }
 
 /* Create a new argument object using the same obstack as ARGV; thus,
@@ -909,6 +946,7 @@ m4_make_argv_ref (m4_macro_args *argv, const char *argv0, 
size_t argv0_len,
   new_argv->inuse = false;
   new_argv->argv0 = argv0;
   new_argv->argv0_len = argv0_len;
+  new_argv->quote_age = argv->quote_age;
   return new_argv;
 }
 
diff --git a/m4/module.c b/m4/module.c
index afeece4..901a48f 100644
--- a/m4/module.c
+++ b/m4/module.c
@@ -195,7 +195,8 @@ install_macro_table (m4 *context, m4_module *module)
          m4_symbol_value *value = m4_symbol_value_create ();
          size_t len = strlen (mp->value);
 
-         m4_set_symbol_value_text (value, xmemdup (mp->value, len + 1), len);
+         m4_set_symbol_value_text (value, xmemdup (mp->value, len + 1),
+                                    len, 0);
          VALUE_MODULE (value) = module;
 
          m4_symbol_pushdef (M4SYMTAB, mp->name, value);
diff --git a/m4/symtab.c b/m4/symtab.c
index 2f83f7b..932a31f 100644
--- a/m4/symtab.c
+++ b/m4/symtab.c
@@ -414,9 +414,10 @@ m4_symbol_value_copy (m4_symbol_value *dest, 
m4_symbol_value *src)
   if (m4_is_symbol_value_text (src))
     {
       size_t len = m4_get_symbol_value_len (src);
+      unsigned int age = m4_get_symbol_value_quote_age (src);
       m4_set_symbol_value_text (dest,
                                xmemdup (m4_get_symbol_value_text (src),
-                                        len + 1), len);
+                                        len + 1), len, age);
     }
   else if (m4_is_symbol_value_placeholder (src))
     m4_set_symbol_value_placeholder (dest,
@@ -662,6 +663,14 @@ m4_get_symbol_value_len (m4_symbol_value *value)
   return value->u.u_t.len;
 }
 
+#undef m4_get_symbol_value_quote_age
+unsigned int
+m4_get_symbol_value_quote_age (m4_symbol_value *value)
+{
+  assert (value && value->type == M4_SYMBOL_TEXT);
+  return value->u.u_t.quote_age;
+}
+
 #undef m4_get_symbol_value_func
 m4_builtin_func *
 m4_get_symbol_value_func (m4_symbol_value *value)
@@ -688,7 +697,8 @@ m4_get_symbol_value_placeholder (m4_symbol_value *value)
 
 #undef m4_set_symbol_value_text
 void
-m4_set_symbol_value_text (m4_symbol_value *value, const char *text, size_t len)
+m4_set_symbol_value_text (m4_symbol_value *value, const char *text, size_t len,
+                          unsigned int quote_age)
 {
   assert (value && text);
   /* TODO - this assertion enforces NUL-terminated text with no
@@ -701,6 +711,7 @@ m4_set_symbol_value_text (m4_symbol_value *value, const 
char *text, size_t len)
   value->type = M4_SYMBOL_TEXT;
   value->u.u_t.text = text;
   value->u.u_t.len = len;
+  value->u.u_t.quote_age = quote_age;
 }
 
 #undef m4_set_symbol_value_builtin
diff --git a/m4/syntax.c b/m4/syntax.c
index 9f2e122..0bce3c0 100644
--- a/m4/syntax.c
+++ b/m4/syntax.c
@@ -102,11 +102,12 @@
 
    M4_SYNTAX_RQUOTE and M4_SYNTAX_ECOMM do not start tokens.  */
 
-static bool    check_is_single_quotes          (m4_syntax_table *);
-static bool    check_is_single_comments        (m4_syntax_table *);
-static bool    check_is_macro_escaped          (m4_syntax_table *);
-static int     add_syntax_attribute            (m4_syntax_table *, int, int);
-static int     remove_syntax_attribute         (m4_syntax_table *, int, int);
+static bool check_is_single_quotes     (m4_syntax_table *);
+static bool check_is_single_comments   (m4_syntax_table *);
+static bool check_is_macro_escaped     (m4_syntax_table *);
+static int add_syntax_attribute                (m4_syntax_table *, int, int);
+static int remove_syntax_attribute     (m4_syntax_table *, int, int);
+static void set_quote_age              (m4_syntax_table *, bool, bool);
 
 m4_syntax_table *
 m4_syntax_create (void)
@@ -392,6 +393,7 @@ m4_set_syntax (m4_syntax_table *syntax, char key, char 
action,
       syntax->is_single_quotes         = true;
       syntax->is_single_comments       = true;
       syntax->is_macro_escaped         = false;
+      set_quote_age (syntax, true, false);
       return 0;
     }
 
@@ -417,6 +419,7 @@ m4_set_syntax (m4_syntax_table *syntax, char key, char 
action,
     default:
       assert (false);
     }
+  set_quote_age (syntax, false, true);
   return code;
 }
 
@@ -431,10 +434,8 @@ check_is_single_quotes (m4_syntax_table *syntax)
     return false;
   assert (syntax->lquote.length == 1 && syntax->rquote.length == 1);
 
-  if (m4_has_syntax (syntax, to_uchar (*syntax->lquote.string),
-                    M4_SYNTAX_LQUOTE)
-      && m4_has_syntax (syntax, to_uchar (*syntax->rquote.string),
-                       M4_SYNTAX_RQUOTE))
+  if (m4_has_syntax (syntax, *syntax->lquote.string, M4_SYNTAX_LQUOTE)
+      && m4_has_syntax (syntax, *syntax->rquote.string, M4_SYNTAX_RQUOTE))
     return true;
 
   /* The most recent action invalidated our current lquote/rquote.  If
@@ -486,10 +487,8 @@ check_is_single_comments (m4_syntax_table *syntax)
     return false;
   assert (syntax->bcomm.length == 1 && syntax->ecomm.length == 1);
 
-  if (m4_has_syntax (syntax, to_uchar (*syntax->bcomm.string),
-                    M4_SYNTAX_BCOMM)
-      && m4_has_syntax (syntax, to_uchar (*syntax->ecomm.string),
-                       M4_SYNTAX_ECOMM))
+  if (m4_has_syntax (syntax, *syntax->bcomm.string, M4_SYNTAX_BCOMM)
+      && m4_has_syntax (syntax, *syntax->ecomm.string, M4_SYNTAX_ECOMM))
     return true;
 
   /* The most recent action invalidated our current bcomm/ecomm.  If
@@ -558,9 +557,6 @@ m4_set_quotes (m4_syntax_table *syntax, const char *lq, 
const char *rq)
 
   assert (syntax);
 
-  free (syntax->lquote.string);
-  free (syntax->rquote.string);
-
   /* POSIX states that with 0 arguments, the default quotes are used.
      POSIX XCU ERN 112 states that behavior is implementation-defined
      if there was only one argument, or if there is an empty string in
@@ -576,6 +572,12 @@ m4_set_quotes (m4_syntax_table *syntax, const char *lq, 
const char *rq)
   else if (!rq || (*lq && !*rq))
     rq = DEF_RQUOTE;
 
+  if (strcmp (syntax->lquote.string, lq) == 0
+      && strcmp (syntax->rquote.string, rq) == 0)
+    return;
+
+  free (syntax->lquote.string);
+  free (syntax->rquote.string);
   syntax->lquote.string = xstrdup (lq);
   syntax->lquote.length = strlen (syntax->lquote.string);
   syntax->rquote.string = xstrdup (rq);
@@ -587,7 +589,7 @@ m4_set_quotes (m4_syntax_table *syntax, const char *lq, 
const char *rq)
 
   syntax->is_single_quotes
     = (syntax->lquote.length == 1 && syntax->rquote.length == 1
-       && !m4_has_syntax (syntax, to_uchar (*syntax->lquote.string),
+       && !m4_has_syntax (syntax, *syntax->lquote.string,
                          (M4_SYNTAX_IGNORE | M4_SYNTAX_ESCAPE
                           | M4_SYNTAX_ALPHA | M4_SYNTAX_NUM)));
 
@@ -608,9 +610,9 @@ m4_set_quotes (m4_syntax_table *syntax, const char *lq, 
const char *rq)
       add_syntax_attribute (syntax, to_uchar (syntax->rquote.string[0]),
                            M4_SYNTAX_RQUOTE);
     }
-
   if (syntax->is_macro_escaped)
     check_is_macro_escaped (syntax);
+  set_quote_age (syntax, false, false);
 }
 
 void
@@ -620,9 +622,6 @@ m4_set_comment (m4_syntax_table *syntax, const char *bc, 
const char *ec)
 
   assert (syntax);
 
-  free (syntax->bcomm.string);
-  free (syntax->ecomm.string);
-
   /* POSIX requires no arguments to disable comments, and that one
      argument use newline as the close-comment.  POSIX XCU ERN 131
      states that empty arguments invoke implementation-defined
@@ -635,6 +634,12 @@ m4_set_comment (m4_syntax_table *syntax, const char *bc, 
const char *ec)
   else if (!ec || (*bc && !*ec))
     ec = DEF_ECOMM;
 
+  if (strcmp (syntax->bcomm.string, bc) == 0
+      && strcmp (syntax->ecomm.string, ec) == 0)
+    return;
+
+  free (syntax->bcomm.string);
+  free (syntax->ecomm.string);
   syntax->bcomm.string = xstrdup (bc);
   syntax->bcomm.length = strlen (syntax->bcomm.string);
   syntax->ecomm.string = xstrdup (ec);
@@ -646,7 +651,7 @@ m4_set_comment (m4_syntax_table *syntax, const char *bc, 
const char *ec)
 
   syntax->is_single_comments
     = (syntax->bcomm.length == 1 && syntax->ecomm.length == 1
-       && !m4_has_syntax (syntax, to_uchar (*syntax->bcomm.string),
+       && !m4_has_syntax (syntax, *syntax->bcomm.string,
                          (M4_SYNTAX_IGNORE | M4_SYNTAX_ESCAPE
                           | M4_SYNTAX_ALPHA | M4_SYNTAX_NUM
                           | M4_SYNTAX_LQUOTE)));
@@ -667,11 +672,82 @@ m4_set_comment (m4_syntax_table *syntax, const char *bc, 
const char *ec)
       add_syntax_attribute (syntax, to_uchar (syntax->ecomm.string[0]),
                            M4_SYNTAX_ECOMM);
     }
-
   if (syntax->is_macro_escaped)
     check_is_macro_escaped (syntax);
+  set_quote_age (syntax, false, false);
 }
 
+/* Call this when changing anything that might impact the quote age,
+   so that m4_quote_age and m4_safe_quotes will reflect the change.
+   If RESET, changesyntax was reset to its default stage; if CHANGE,
+   arbitrary syntax has changed; otherwise, just quotes or comment
+   delimiters have changed.  */
+static void
+set_quote_age (m4_syntax_table *syntax, bool reset, bool change)
+{
+  /* Multi-character quotes are inherently unsafe, since concatenation
+     of individual characters can result in a quote delimiter,
+     consider:
+
+     define(echo,``$1'')define(a,A)changequote(<[,]>)echo(<[]]><[>a]>)
+     => A]> (not ]>a)
+
+   Also, unquoted close delimiters are unsafe, consider:
+
+     define(echo,``$1'')define(a,A)echo(`a''`a')
+     => aA' (not a'a)
+
+   Duplicated start and end quote delimiters, as well as comment
+   delimiters that overlap with quote delimiters or active characters,
+   also present a problem, consider:
+
+     define(echo,$*)echo(a,a,a`'define(a,A)changecom(`,',`,'))
+     => A,a,A (not A,A,A)
+
+   The impact of arbitrary changesyntax is difficult to characterize.
+   So if things are in their default state, we use 0 for the upper 16
+   bits of quote_age; otherwise we increment syntax_age for each
+   changesyntax, but saturate it at 0xffff rather than wrapping
+   around.  Perhaps a cache of other frequently used states is
+   warranted, if changesyntax becomes more popular
+
+   Rather than check every token for an unquoted delimiter, we merely
+   encode current_quote_age to 0 when things are unsafe, and non-zero
+   when safe (namely, the syntax_age in the upper 16 bits, coupled
+   with the 16-bit value composed of the single-character start and
+   end quote delimiters).  There may be other situations which are
+   safe even when this algorithm sets the quote_age to zero, but at
+   least a quote_age of zero always produces correct results (although
+   it may take more time in doing so).  */
+
+  unsigned short local_syntax_age;
+  if (reset)
+    local_syntax_age = 0;
+  else if (change && syntax->syntax_age < 0xffff)
+    local_syntax_age = ++syntax->syntax_age;
+  else
+    local_syntax_age = syntax->syntax_age;
+  if (local_syntax_age < 0xffff && syntax->is_single_quotes
+      && !m4_has_syntax (syntax, *syntax->lquote.string,
+                        (M4_SYNTAX_ALPHA | M4_SYNTAX_NUM | M4_SYNTAX_OPEN
+                         | M4_SYNTAX_COMMA | M4_SYNTAX_CLOSE
+                         | M4_SYNTAX_SPACE))
+      && !m4_has_syntax (syntax, *syntax->rquote.string,
+                        (M4_SYNTAX_ALPHA | M4_SYNTAX_NUM | M4_SYNTAX_OPEN
+                         | M4_SYNTAX_COMMA | M4_SYNTAX_CLOSE
+                         | M4_SYNTAX_SPACE))
+      && *syntax->lquote.string != *syntax->rquote.string
+      && *syntax->bcomm.string != *syntax->lquote.string
+      && !m4_has_syntax (syntax, *syntax->bcomm.string,
+                        M4_SYNTAX_OPEN | M4_SYNTAX_COMMA | M4_SYNTAX_CLOSE))
+    {
+      syntax->quote_age = ((local_syntax_age << 16)
+                          | ((*syntax->lquote.string & 0xff) << 8)
+                          | (*syntax->rquote.string & 0xff));
+    }
+  else
+    syntax->quote_age = 0;
+}
 
 
 /* Define these functions at the end, so that calls in the file use the
diff --git a/m4/utility.c b/m4/utility.c
index 72205a8..b367717 100644
--- a/m4/utility.c
+++ b/m4/utility.c
@@ -62,7 +62,7 @@ m4_bad_argc (m4 *context, int argc, const char *caller, 
unsigned int min,
 static const char *
 skip_space (m4 *context, const char *arg)
 {
-  while (m4_has_syntax (M4SYNTAX, to_uchar (*arg), M4_SYNTAX_SPACE))
+  while (m4_has_syntax (M4SYNTAX, *arg, M4_SYNTAX_SPACE))
     arg++;
   return arg;
 }
diff --git a/modules/m4.c b/modules/m4.c
index f9d65ed..ca5cf45 100644
--- a/modules/m4.c
+++ b/modules/m4.c
@@ -165,11 +165,7 @@ M4BUILTIN_HANDLER (define)
     {
       m4_symbol_value *value = m4_symbol_value_create ();
 
-      if (argc == 2)
-       m4_set_symbol_value_text (value, xstrdup (""), 0);
-      else
-       m4_symbol_value_copy (value, m4_arg_symbol (argv, 2));
-
+      m4_symbol_value_copy (value, m4_arg_symbol (argv, 2));
       m4_symbol_define (M4SYMTAB, M4ARG (1), value);
     }
   else
@@ -197,11 +193,7 @@ M4BUILTIN_HANDLER (pushdef)
     {
       m4_symbol_value *value = m4_symbol_value_create ();
 
-      if (argc == 2)
-       m4_set_symbol_value_text (value, xstrdup (""), 0);
-      else
-       m4_symbol_value_copy (value, m4_arg_symbol (argv, 2));
-
+      m4_symbol_value_copy (value, m4_arg_symbol (argv, 2));
       m4_symbol_pushdef (M4SYMTAB, M4ARG (1), value);
     }
   else
diff --git a/src/freeze.c b/src/freeze.c
index 8430dda..cdadf60 100644
--- a/src/freeze.c
+++ b/src/freeze.c
@@ -757,7 +757,7 @@ ill-formed frozen file, version 2 directive `%c' 
encountered"), 'T');
 
            m4_set_symbol_value_text (token, xmemdup (string[1],
                                                      number[1] + 1),
-                                     number[1]);
+                                     number[1], 0);
            VALUE_MODULE (token) = module;
            VALUE_MAX_ARGS (token) = -1;
 
diff --git a/src/main.c b/src/main.c
index ef9cb7b..7c35e64 100644
--- a/src/main.c
+++ b/src/main.c
@@ -676,7 +676,7 @@ main (int argc, char *const *argv, char *const *envp)
              }
            m4_set_symbol_value_text (value, xstrdup (macro_value
                                                      ? macro_value : ""),
-                                     len);
+                                     len, 0);
 
            if (defn->code == 'D')
              m4_symbol_define (M4SYMTAB, macro_name, value);
diff --git a/tests/builtins.at b/tests/builtins.at
index eeaf0d3..91586b8 100644
--- a/tests/builtins.at
+++ b/tests/builtins.at
@@ -88,6 +88,26 @@ AT_CHECK_M4([in], [0], [expout])
 AT_CLEANUP
 
 
+## ----------- ##
+## changequote ##
+## ----------- ##
+
+AT_SETUP([changequote])
+
+AT_DATA([in.m4],
+[[define(`aaaaaaaaaaaaaaaaaaaa', `A')define(`q', `"$@"')
+changequote(`"', `"')
+q(q("aaaaaaaaaaaaaaaaaaaa", "a"))
+]])
+
+AT_CHECK_M4([in.m4], [0], [[
+
+A,a
+]])
+
+AT_CLEANUP
+
+
 ## ----- ##
 ## debug ##
 ## ----- ##
@@ -159,6 +179,57 @@ AT_CLEANUP
 
 
 
+## ---- ##
+## defn ##
+## ---- ##
+
+AT_SETUP([defn])
+
+dnl This test is a reminder that defn needs to be fixed to handle
+dnl concatenation of builtin tokens with text, and user macros need
+dnl to handle builtin tokens without flattening.
+AT_XFAIL_IF([:])
+
+AT_DATA([[in.m4]],
+[[define(`e', `$@')define(`q', ``$@'')define(`u', `$*')
+define(`cmp', `ifelse($1, $2, `yes', `no')')define(`d', defn(`defn'))
+cmp(`defn(`defn')', `defn(`d')')
+cmp(`defn(`defn')', ``<defn>'')
+cmp(`q(defn(`defn'))', `q(defn(`d'))')
+cmp(`q(defn(`defn'))', `q(`<defn>')')
+cmp(`q(defn(`defn'))', ``'')
+cmp(`q(`1', `2', defn(`defn'))', `q(`1', `2', defn(`d'))')
+cmp(`q(`1', `2', defn(`defn'))', `q(`1', `2', `<defn>')')
+cmp(`q(`1', `2', defn(`defn'))', ```1',`2',<defn>'')
+cmp(`q(`1', `2', defn(`defn'))', ```1',`2',`''')
+define(`cat', `$1`'ifelse(address@hidden:@', `1', `', `$0(shift($@))')')
+cat(`define(`foo',', defn(`divnum'), `)foo')
+cat(e(`define(`bar',', defn(`divnum'), `)bar'))
+m4wrap(`u('q(`cat(`define(`baz','', defn(`divnum'), ``)baz')')`)
+')
+]])
+
+AT_CHECK_M4([in.m4], [0], [[
+
+yes
+no
+yes
+no
+no
+yes
+no
+no
+no
+
+0
+0
+
+0
+]])
+
+AT_CLEANUP
+
+
 ## ------ ##
 ## divert ##
 ## ------ ##
-- 
1.5.3.5

>From 83e9a157eb78afe47c7955a6fa99e0de79a8b40a Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Wed, 24 Oct 2007 08:36:26 -0600
Subject: [PATCH] Stage 5: add notion of quote age.

* src/input.c: Comment cleanups.
(current_quote_age): New global variable.
(set_quote_age): New helper function.
(input_init, set_word_regexp): Use it.
(set_quotes, set_comment): Likewise, and detect no-op changes.
(quote_age, safe_quotes): New functions.
(next_token): Track quote age.
* src/m4.h (struct token_data): Add quote_age member.
(TOKEN_DATA_QUOTE_AGE, quote_age, safe_quotes): New prototypes.
* src/macro.c (struct macro_arguments): Add quote_age member.
(expand_token): Alter signature and track quote age.
(expand_input, expand_argument): All callers changed.
(collect_arguments, make_argv_ref): Track quote age.
(arg_text, arg_len, arg_func): Detect type mismatch.
* doc/m4.texinfo (Ifelse, Changequote): Add more tests.
(Incompatibilities): Fix typo.
* examples/wraplifo.m4: New file.
* examples/Makefile.am (EXTRA_DIST): Distribute it.

(cherry picked from commit 8b5b3b7a74f452fed795c063965966934a68755d)

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog            |   22 ++++
 doc/m4.texinfo       |   55 ++++++++++-
 examples/Makefile.am |    3 +-
 examples/wraplifo.m4 |   10 ++
 src/input.c          |  261 ++++++++++++++++++++++++++++++++++++--------------
 src/m4.h             |   10 ++
 src/macro.c          |  140 +++++++++++++++++----------
 7 files changed, 377 insertions(+), 124 deletions(-)
 create mode 100644 examples/wraplifo.m4

diff --git a/ChangeLog b/ChangeLog
index 1672120..79a133a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,25 @@
+2007-12-07  Eric Blake  <address@hidden>
+
+       Stage 5: add notion of quote age.
+       * src/input.c: Comment cleanups.
+       (current_quote_age): New global variable.
+       (set_quote_age): New helper function.
+       (input_init, set_word_regexp): Use it.
+       (set_quotes, set_comment): Likewise, and detect no-op changes.
+       (quote_age, safe_quotes): New functions.
+       (next_token): Track quote age.
+       * src/m4.h (struct token_data): Add quote_age member.
+       (TOKEN_DATA_QUOTE_AGE, quote_age, safe_quotes): New prototypes.
+       * src/macro.c (struct macro_arguments): Add quote_age member.
+       (expand_token): Alter signature and track quote age.
+       (expand_input, expand_argument): All callers changed.
+       (collect_arguments, make_argv_ref): Track quote age.
+       (arg_text, arg_len, arg_func): Detect type mismatch.
+       * doc/m4.texinfo (Ifelse, Changequote): Add more tests.
+       (Incompatibilities): Fix typo.
+       * examples/wraplifo.m4: New file.
+       * examples/Makefile.am (EXTRA_DIST): Distribute it.
+
 2007-12-04  Eric Blake  <address@hidden>
 
        Fix builds with OpenBSD make.
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 3da16fc..803dbf0 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -2635,6 +2635,47 @@ ifelse(`foo', `bar', `3', `gnu', `gnats', `6', `7', `8')
 @result{}7
 @end example
 
address@hidden
address@hidden Stress tests, not worth documenting.
address@hidden It would be nice to pass builtin tokens through ifelse, m4wrap,
address@hidden user macros; hence the fixmes.
address@hidden
+define(`e', `$@@')define(`q', ``$@@'')define(`u', `$*')
address@hidden
+define(`cmp', `ifelse($1, $2, `yes', `no')')define(`d', defn(`defn'))
address@hidden
+cmp(`defn(`defn')', `defn(`d')')
address@hidden
+cmp(`defn(`defn')', ``<defn>'')
address@hidden
+cmp(`q(defn(`defn'))', `q(defn(`d'))')
address@hidden
+cmp(`q(defn(`defn'))', `q(`<defn>')')
address@hidden
+cmp(`q(defn(`defn'))', ``'')
address@hidden
+cmp(`q(`1', `2', defn(`defn'))', `q(`1', `2', defn(`d'))')
address@hidden
+cmp(`q(`1', `2', defn(`defn'))', `q(`1', `2', `<defn>')')
address@hidden
+cmp(`q(`1', `2', defn(`defn'))', ```1',`2',<defn>'')
address@hidden
+cmp(`q(`1', `2', defn(`defn'))', ```1',`2',`''')-fixme
address@hidden
+define(`cat', `$1`'ifelse(`$#', `1', `', `$0(shift($@@))')')
address@hidden
+cat(`define(`foo',', defn(`divnum'), `)foo')-fixme
address@hidden
+cat(e(`define(`bar',', defn(`divnum'), `)bar'))-fixme
address@hidden
+m4wrap(`u('q(`cat(`define(`baz','', defn(`divnum'), ``)baz')')`)-fixme
+')
address@hidden
+^D
address@hidden
address@hidden example
address@hidden ignore
+
 Naturally, the normal case will be slightly more advanced than these
 examples.  A common use of @code{ifelse} is in macros implementing loops
 of various kinds.
@@ -3714,6 +3755,18 @@ changequote(`"', `"')
 @result{}hiHIhi
 @end example
 
address@hidden
address@hidden And another stress test, not worth documenting in the manual.
address@hidden
+define(`aaaaaaaaaaaaaaaaaaaa', `A')define(`q', `"$@@"')
address@hidden
+changequote(`"', `"')
address@hidden
+q(q("aaaaaaaaaaaaaaaaaaaa", "a"))
address@hidden,a
address@hidden example
address@hidden ignore
+
 It is an error if the end of file occurs within a quoted string.
 
 @comment status: 1
@@ -6490,7 +6543,7 @@ of @samp{-} on the command line.
 @acronym{POSIX} requires @code{m4wrap} (@pxref{M4wrap}) to act in FIFO
 (first-in, first-out) order, but @acronym{GNU} @code{m4} currently uses
 LIFO order.  Furthermore, @acronym{POSIX} states that only the first
-argument to @code{m4wrap} is saved for later evaluation, bug
+argument to @code{m4wrap} is saved for later evaluation, but
 @acronym{GNU} @code{m4} saves and processes all arguments, with output
 separated by spaces.
 
diff --git a/examples/Makefile.am b/examples/Makefile.am
index b1ef68a..c1dc522 100644
--- a/examples/Makefile.am
+++ b/examples/Makefile.am
@@ -58,4 +58,5 @@ translit.m4 \
 undivert.incl \
 undivert.m4 \
 wrap.m4 \
-wrapfifo.m4
+wrapfifo.m4 \
+wraplifo.m4
diff --git a/examples/wraplifo.m4 b/examples/wraplifo.m4
new file mode 100644
index 0000000..bdbf3fb
--- /dev/null
+++ b/examples/wraplifo.m4
@@ -0,0 +1,10 @@
+dnl Redefine m4wrap to have LIFO semantics.
+define(`_m4wrap_level', `0')dnl
+define(`_m4wrap', defn(`m4wrap'))dnl
+define(`m4wrap',
+`ifdef(`m4wrap'_m4wrap_level,
+       `define(`m4wrap'_m4wrap_level,
+               `$1'defn(`m4wrap'_m4wrap_level))',
+       `_m4wrap(`define(`_m4wrap_level', incr(_m4wrap_level))dnl
+m4wrap'_m4wrap_level)dnl
+define(`m4wrap'_m4wrap_level, `$1')')')dnl
diff --git a/src/input.c b/src/input.c
index 0aa6036..551b43d 100644
--- a/src/input.c
+++ b/src/input.c
@@ -23,12 +23,13 @@
 
 #include "m4.h"
 
-/* Unread input can be either files, that should be read (eg. included
-   files), strings, which should be rescanned (eg. macro expansion text),
-   or quoted macro definitions (as returned by the builtin "defn").
-   Unread input are organised in a stack, implemented with an obstack.
-   Each input source is described by a "struct input_block".  The obstack
-   is "current_input".  The top of the input stack is "isp".
+/* Unread input can be either files to be read (command line,
+   "include", "sinclude"), strings which should be rescanned (macro
+   expansion text), or quoted macro definitions (as returned by the
+   builtin "defn").  Unread input is organized in a stack, implemented
+   with an obstack.  Each input source is described by a "struct
+   input_block".  The obstack is "current_input".  The top of the
+   input stack is "isp".
 
    The macro "m4wrap" places the text to be saved on another input
    stack, on the obstack "wrapup_stack", whose top is "wsp".  When EOF
@@ -42,12 +43,13 @@
 
    Pushing new input on the input stack is done by push_file (),
    push_string (), push_wrapup () (for wrapup text), and push_macro ()
-   (for macro definitions).  Because macro expansion needs direct access
-   to the current input obstack (for optimisation), push_string () are
-   split in two functions, push_string_init (), which returns a pointer
-   to the current input stack, and push_string_finish (), which return a
-   pointer to the final text.  The input_block *next is used to manage
-   the coordination between the different push routines.
+   (for macro definitions).  Because macro expansion needs direct
+   access to the current input obstack (for optimization), push_string
+   () is split in two functions, push_string_init (), which returns a
+   pointer to the current input stack, and push_string_finish (),
+   which returns a pointer to the final text.  The input_block *next
+   is used to manage the coordination between the different push
+   routines.
 
    The current file and line number are stored in two global
    variables, for use by the error handling functions in m4.c.  Macro
@@ -62,6 +64,7 @@
 # include "regex.h"
 #endif /* ENABLE_CHANGEWORD */
 
+/* Type of an input block.  */
 enum input_type
 {
   INPUT_STRING,                /* String resulting from macro expansion.  */
@@ -71,28 +74,29 @@ enum input_type
 
 typedef enum input_type input_type;
 
+/* A block of input to be scanned.  */
 struct input_block
 {
-  struct input_block *prev;    /* previous input_block on the input stack */
-  input_type type;             /* see enum values */
-  const char *file;            /* file where this input is from */
-  int line;                    /* line where this input is from */
+  struct input_block *prev;    /* Previous input_block on the input stack.  */
+  input_type type;             /* See enum values.  */
+  const char *file;            /* File where this input is from.  */
+  int line;                    /* Line where this input is from.  */
   union
     {
       struct
        {
-         char *string;         /* remaining string value */
+         char *string;         /* Remaining string value.  */
        }
        u_s;    /* INPUT_STRING */
       struct
        {
-         FILE *fp;                  /* input file handle */
-         bool_bitfield end : 1;     /* true if peek has seen EOF */
-         bool_bitfield close : 1;   /* true if we should close file on pop */
-         bool_bitfield advance : 1; /* track previous start_of_input_line */
+         FILE *fp;                  /* Input file handle.  */
+         bool_bitfield end : 1;     /* True if peek has seen EOF.  */
+         bool_bitfield close : 1;   /* True to close file on pop.  */
+         bool_bitfield advance : 1; /* Track previous start_of_input_line.  */
        }
        u_f;    /* INPUT_FILE */
-      builtin_func *func;      /* pointer to macro's function */
+      builtin_func *func;      /* Pointer to macro's function.  */
     }
   u;
 };
@@ -136,8 +140,8 @@ static bool start_of_input_line;
 /* Flag for next_char () to recognize change in input block.  */
 static bool input_change;
 
-#define CHAR_EOF       256     /* character return on EOF */
-#define CHAR_MACRO     257     /* character return for MACRO token */
+#define CHAR_EOF       256     /* Character return on EOF.  */
+#define CHAR_MACRO     257     /* Character return for MACRO token.  */
 
 /* Quote chars.  */
 STRING rquote;
@@ -151,16 +155,30 @@ STRING ecomm;
 
 # define DEFAULT_WORD_REGEXP "[_a-zA-Z][_a-zA-Z0-9]*"
 
+/* Table of characters that can start a word.  */
 static char *word_start;
+
+/* Current regular expression for detecting words.  */
 static struct re_pattern_buffer word_regexp;
-static int default_word_regexp;
+
+/* True if changeword is not active.  */
+static bool default_word_regexp;
+
+/* Reused memory for detecting matches in word detection.  */
 static struct re_registers regs;
 
 #else /* !ENABLE_CHANGEWORD */
-# define default_word_regexp 1
+# define default_word_regexp true
 #endif /* !ENABLE_CHANGEWORD */
 
+/* Track the current quote age, determined by all significant
+   changequote, changecom, and changeword calls, since any one of
+   these can alter the rescan of a prior parameter in a quoted
+   context.  */
+static unsigned int current_quote_age;
+
 static bool pop_input (bool);
+static void set_quote_age (void);
 
 #ifdef DEBUG_INPUT
 static const char *token_type_string (token_type);
@@ -172,7 +190,8 @@ static const char *token_type_string (token_type);
 | current file name and line number.  If next is non-NULL, this push |
 | invalidates a call to push_string_init (), whose storage is        |
 | consequently released.  If CLOSE, then close FP after EOF is       |
-| detected.                                                          |
+| detected.  TITLE is used as the location for text parsed from the  |
+| file (not necessarily the file name).                              |
 `-------------------------------------------------------------------*/
 
 void
@@ -206,11 +225,11 @@ push_file (FILE *fp, const char *title, bool close)
   isp = i;
 }
 
-/*---------------------------------------------------------------.
-| push_macro () pushes a builtin macro's definition on the input |
-| stack.  If next is non-NULL, this push invalidates a call to   |
-| push_string_init (), whose storage is consequently released.   |
-`---------------------------------------------------------------*/
+/*-----------------------------------------------------------------.
+| push_macro () pushes the builtin macro FUNC on the input stack.  |
+| If next is non-NULL, this push invalidates a call to             |
+| push_string_init (), whose storage is consequently released.     |
+`-----------------------------------------------------------------*/
 
 void
 push_macro (builtin_func *func)
@@ -235,10 +254,10 @@ push_macro (builtin_func *func)
   isp = i;
 }
 
-/*------------------------------------------------------------------.
-| First half of push_string ().  The pointer next points to the new |
-| input_block.                                                     |
-`------------------------------------------------------------------*/
+/*--------------------------------------------------------------.
+| First half of push_string ().  The return value points to the |
+| obstack where expansion text should be placed.                |
+`--------------------------------------------------------------*/
 
 struct obstack *
 push_string_init (void)
@@ -257,14 +276,15 @@ push_string_init (void)
   return current_input;
 }
 
-/*------------------------------------------------------------------------.
-| Last half of push_string ().  If next is now NULL, a call to push_file  |
-| () has invalidated the previous call to push_string_init (), so we just |
-| give up.  If the new object is void, we do not push it.  The function        
  |
-| push_string_finish () returns a pointer to the finished object.  This        
  |
-| pointer is only for temporary use, since reading the next token might        
  |
-| release the memory used for the object.                                |
-`------------------------------------------------------------------------*/
+/*-------------------------------------------------------------------.
+| Last half of push_string ().  If next is now NULL, a call to       |
+| push_file () or push_macro () has invalidated the previous call to |
+| push_string_init (), so we just give up.  If the new object is     |
+| void, we do not push it.  The function push_string_finish ()       |
+| returns a pointer to the finished object.  This pointer is only    |
+| for temporary use, since reading the next token might release the  |
+| memory used for the object.                                        |
+`-------------------------------------------------------------------*/
 
 const char *
 push_string_finish (void)
@@ -413,7 +433,7 @@ pop_wrapup (void)
 
 /*-------------------------------------------------------------------.
 | When a MACRO token is seen, next_token () uses init_macro_token () |
-| to retrieve the value of the function pointer.                     |
+| to retrieve the value of the function pointer and store it in TD.  |
 `-------------------------------------------------------------------*/
 
 static void
@@ -425,12 +445,14 @@ init_macro_token (token_data *td)
 }
 
 
-/*------------------------------------------------------------------------.
-| Low level input is done a character at a time.  The function peek_input |
-| () is used to look at the next character in the input stream.  At any        
  |
-| given time, it reads from the input_block on the top of the current    |
-| input stack.                                                           |
-`------------------------------------------------------------------------*/
+/*-----------------------------------------------------------------.
+| Low level input is done a character at a time.  The function     |
+| peek_input () is used to look at the next character in the input |
+| stream.  At any given time, it reads from the input_block on the |
+| top of the current input stack.  The return value is an unsigned |
+| char, or CHAR_EOF if there is no more input, or CHAR_MACRO if a  |
+| builtin token occurs next.                                       |
+`-----------------------------------------------------------------*/
 
 static int
 peek_input (void)
@@ -556,7 +578,8 @@ next_char_1 (void)
 
 /*-------------------------------------------------------------------.
 | skip_line () simply discards all immediately following characters, |
-| up to the first newline.  It is only used from m4_dnl ().          |
+| up to the first newline.  It is only used from m4_dnl ().  Report  |
+| warnings on behalf of NAME.                                        |
 `-------------------------------------------------------------------*/
 
 void
@@ -585,7 +608,7 @@ skip_line (const char *name)
 
 /*------------------------------------------------------------------.
 | This function is for matching a string against a prefix of the    |
-| input stream.  If the string matches the input and consume is     |
+| input stream.  If the string S matches the input and CONSUME is   |
 | true, the input is discarded; otherwise any characters read are   |
 | pushed back again.  The function is used only when multicharacter |
 | quotes or comment delimiters are used.                            |
@@ -637,7 +660,7 @@ match_input (const char *s, bool consume)
 | will not hurt efficiency too much when single character quotes and  |
 | comment delimiters are used.  If CONSUME, then CH is the result of  |
 | next_char, and a successful match will discard the matched string.  |
-| Otherwise, CH is the result of peek_char, and the input stream is   |
+| Otherwise, CH is the result of peek_input, and the input stream is  |
 | effectively unchanged.                                              |
 `--------------------------------------------------------------------*/
 
@@ -648,7 +671,7 @@ match_input (const char *s, bool consume)
 
 
 /*----------------------------------------------------------.
-| Inititialise input stacks, and quote/comment characters.  |
+| Inititialize input stacks, and quote/comment characters.  |
 `----------------------------------------------------------*/
 
 void
@@ -689,21 +712,20 @@ input_init (void)
 #ifdef ENABLE_CHANGEWORD
   set_word_regexp (NULL, user_word_regexp);
 #endif /* ENABLE_CHANGEWORD */
+
+  set_quote_age ();
 }
 
 
-/*------------------------------------------------------------------.
-| Functions for setting quotes and comment delimiters.  Used by            |
-| m4_changecom () and m4_changequote ().  Pass NULL if the argument |
-| was not present, to distinguish from an explicit empty string.    |
-`------------------------------------------------------------------*/
+/*--------------------------------------------------------------------.
+| Set the quote delimiters to LQ and RQ.  Used by m4_changequote ().  |
+| Pass NULL if the argument was not present, to distinguish from an   |
+| explicit empty string.                                              |
+`--------------------------------------------------------------------*/
 
 void
 set_quotes (const char *lq, const char *rq)
 {
-  free (lquote.string);
-  free (rquote.string);
-
   /* POSIX states that with 0 arguments, the default quotes are used.
      POSIX XCU ERN 112 states that behavior is implementation-defined
      if there was only one argument, or if there is an empty string in
@@ -719,18 +741,27 @@ set_quotes (const char *lq, const char *rq)
   else if (!rq || (*lq && !*rq))
     rq = DEF_RQUOTE;
 
+  if (strcmp (lquote.string, lq) == 0 && strcmp (rquote.string, rq) == 0)
+    return;
+
+  free (lquote.string);
+  free (rquote.string);
   lquote.string = xstrdup (lq);
   lquote.length = strlen (lquote.string);
   rquote.string = xstrdup (rq);
   rquote.length = strlen (rquote.string);
+  set_quote_age ();
 }
 
+/*--------------------------------------------------------------------.
+| Set the comment delimiters to BC and EC.  Used by m4_changecom ().  |
+| Pass NULL if the argument was not present, to distinguish from an   |
+| explicit empty string.                                              |
+`--------------------------------------------------------------------*/
+
 void
 set_comment (const char *bc, const char *ec)
 {
-  free (bcomm.string);
-  free (ecomm.string);
-
   /* POSIX requires no arguments to disable comments.  It requires
      empty arguments to be used as-is, but this is counter to
      traditional behavior, because a non-null begin and null end makes
@@ -743,14 +774,26 @@ set_comment (const char *bc, const char *ec)
   else if (!ec || (*bc && !*ec))
     ec = DEF_ECOMM;
 
+  if (strcmp (bcomm.string, bc) == 0 && strcmp (ecomm.string, ec) == 0)
+    return;
+
+  free (bcomm.string);
+  free (ecomm.string);
   bcomm.string = xstrdup (bc);
   bcomm.length = strlen (bcomm.string);
   ecomm.string = xstrdup (ec);
   ecomm.length = strlen (ecomm.string);
+  set_quote_age ();
 }
 
 #ifdef ENABLE_CHANGEWORD
 
+/*-------------------------------------------------------------------.
+| Set the regular expression for recognizing words to REGEXP, and    |
+| report errors on behalf of CALLER.  If REGEXP is NULL, revert back |
+| to the default parsing rules.                                      |
+`-------------------------------------------------------------------*/
+
 void
 set_word_regexp (const char *caller, const char *regexp)
 {
@@ -762,6 +805,7 @@ set_word_regexp (const char *caller, const char *regexp)
   if (!*regexp || !strcmp (regexp, DEFAULT_WORD_REGEXP))
     {
       default_word_regexp = true;
+      set_quote_age ();
       return;
     }
 
@@ -772,7 +816,6 @@ set_word_regexp (const char *caller, const char *regexp)
 
   if (msg != NULL)
     {
-      /* FIXME - report on behalf of macro caller.  */
       m4_warn (0, caller, _("bad regular expression `%s': %s"), regexp, msg);
       return;
     }
@@ -785,6 +828,7 @@ set_word_regexp (const char *caller, const char *regexp)
   re_set_registers (&word_regexp, &regs, regs.num_regs, regs.start, regs.end);
 
   default_word_regexp = false;
+  set_quote_age ();
 
   if (word_start == NULL)
     word_start = (char *) xmalloc (256);
@@ -799,6 +843,82 @@ set_word_regexp (const char *caller, const char *regexp)
 }
 
 #endif /* ENABLE_CHANGEWORD */
+
+/* Call this when changing anything that might impact the quote age,
+   so that quote_age and safe_quotes will reflect the change.  */
+static void
+set_quote_age (void)
+{
+  /* Multi-character quotes are inherently unsafe, since concatenation
+     of individual characters can result in a quote delimiter,
+     consider:
+
+     define(echo,``$1'')define(a,A)changequote(<[,]>)echo(<[]]><[>a]>)
+     => A]> (not ]>a)
+
+   Also, unquoted close delimiters are unsafe, consider:
+
+     define(echo,``$1'')define(a,A)echo(`a''`a')
+     => aA' (not a'a)
+
+   Comment delimiters that overlap with quote delimiters or active
+   characters also present a problem, consider:
+
+     define(echo,$*)echo(a,a,a`'define(a,A)changecom(`,',`,'))
+     => A,a,A (not A,A,A)
+
+   And let's not even think about the impact of changeword, since it
+   will disappear for M4 2.0.
+
+   So rather than check every token for an unquoted delimiter, we
+   merely encode current_quote_age to 0 when things are unsafe, and
+   non-zero when safe (namely, to the 16-bit value composed of the
+   single-character start and end quote delimiters).  There may be
+   other situations which are safe even when this algorithm sets the
+   quote_age to zero, but at least a quote_age of zero always produces
+   correct results (although it may take more time in doing so).  */
+
+  /* Hueristic of characters that might impact rescan if they appear in
+     a quote delimiter.  */
+#define Letters "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
+  static const char unsafe[] = Letters "_0123456789(,) \t\n\r\f\v";
+#undef Letters
+
+  if (lquote.length == 1 && rquote.length == 1
+      && strpbrk(lquote.string, unsafe) == NULL
+      && strpbrk(rquote.string, unsafe) == NULL
+      && default_word_regexp && *lquote.string != *rquote.string
+      && *bcomm.string != '(' && *bcomm.string != ','
+      && *bcomm.string != ')' && *bcomm.string != *lquote.string)
+    current_quote_age = (((*lquote.string & 0xff) << 8)
+                        | (*rquote.string & 0xff));
+  else
+    current_quote_age = 0;
+}
+
+/* Return the current quote age.  Each non-trivial changequote alters
+   this value; the idea is that if quoting hasn't changed, then we can
+   skip parsing a single argument, quoted or unquoted, within the
+   context of a quoted string, as well as skip parsing a series of
+   quoted arguments within the context of argument collection.  */
+unsigned int
+quote_age (void)
+{
+  /* This accessor is a function, so that the implementation can
+     change if needed.  See set_quote_age for the current
+     implementation.  */
+  return current_quote_age;
+}
+
+/* Return true if the current quote delimiters guarantee that
+   reparsing the current token in the context of a quoted string will
+   be safe.  This could always return false and behavior would still
+   be correct, just slower.  */
+bool
+safe_quotes (void)
+{
+  return current_quote_age != 0;
+}
 
 
 /*--------------------------------------------------------------------.
@@ -835,7 +955,7 @@ next_token (token_data *td, int *line, const char *caller)
   if (!line)
     line = &dummy;
 
- /* Can't consume character until after CHAR_MACRO is handled.  */
+  /* Can't consume character until after CHAR_MACRO is handled.  */
   ch = peek_input ();
   if (ch == CHAR_EOF)
     {
@@ -868,7 +988,7 @@ next_token (token_data *td, int *line, const char *caller)
       if (ch != CHAR_EOF)
        obstack_grow (&token_stack, ecomm.string, ecomm.length);
       else
-       /* current_file changed to "" if we see CHAR_EOF, use the
+       /* Current_file changed to "" if we see CHAR_EOF, use the
           previous value we stored earlier.  */
        m4_error_at_line (EXIT_FAILURE, 0, file, *line, caller,
                          _("end of file in comment"));
@@ -951,7 +1071,7 @@ next_token (token_data *td, int *line, const char *caller)
        {
          ch = next_char ();
          if (ch == CHAR_EOF)
-           /* current_file changed to "" if we see CHAR_EOF, use
+           /* Current_file changed to "" if we see CHAR_EOF, use
               the previous value we stored earlier.  */
            m4_error_at_line (EXIT_FAILURE, 0, file, *line, caller,
                              _("end of file in string"));
@@ -977,6 +1097,7 @@ next_token (token_data *td, int *line, const char *caller)
   TOKEN_DATA_LEN (td) = obstack_object_size (&token_stack);
   obstack_1grow (&token_stack, '\0');
   TOKEN_DATA_TEXT (td) = (char *) obstack_finish (&token_stack);
+  TOKEN_DATA_QUOTE_AGE (td) = current_quote_age;
 #ifdef ENABLE_CHANGEWORD
   if (orig_text == NULL)
     TOKEN_DATA_ORIG_TEXT (td) = TOKEN_DATA_TEXT (td);
diff --git a/src/m4.h b/src/m4.h
index ac81998..d7b6e08 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -299,6 +299,13 @@ struct token_data
             support NUL.  */
          size_t len;
          char *text;
+         /* The value of quote_age when this token was scanned.  If
+            this token is later encountered in the context of
+            scanning a quoted string, and quote_age has not changed,
+            then rescanning this string is provably unnecessary.  If
+            zero, then this string potentially contains content that
+            might change the parse on rescan.  Ignored for 0 len.  */
+         unsigned int quote_age;
 #ifdef ENABLE_CHANGEWORD
          char *original_text;
 #endif
@@ -316,6 +323,7 @@ struct token_data
 #define TOKEN_DATA_TYPE(Td)            ((Td)->type)
 #define TOKEN_DATA_LEN(Td)             ((Td)->u.u_t.len)
 #define TOKEN_DATA_TEXT(Td)            ((Td)->u.u_t.text)
+#define TOKEN_DATA_QUOTE_AGE(Td)       ((Td)->u.u_t.quote_age)
 #ifdef ENABLE_CHANGEWORD
 # define TOKEN_DATA_ORIG_TEXT(Td)      ((Td)->u.u_t.original_text)
 #endif
@@ -355,6 +363,8 @@ void set_comment (const char *, const char *);
 #ifdef ENABLE_CHANGEWORD
 void set_word_regexp (const char *, const char *);
 #endif
+unsigned int quote_age (void);
+bool safe_quotes (void);
 
 /* File: output.c --- output functions.  */
 extern int current_diversion;
diff --git a/src/macro.c b/src/macro.c
index e257485..ec43bc1 100644
--- a/src/macro.c
+++ b/src/macro.c
@@ -41,6 +41,10 @@ struct macro_arguments
   bool_bitfield has_ref : 1;
   const char *argv0; /* The macro name being expanded.  */
   size_t argv0_len; /* Length of argv0.  */
+  /* The value of quote_age used when parsing all arguments in this
+     object, or 0 if quote_age changed during parsing or if any of the
+     arguments might contain content that can affect rescan.  */
+  unsigned int quote_age;
   size_t arraylen; /* True length of allocated elements in array.  */
   /* Used as a variable-length array, storing information about each
      argument.  */
@@ -48,7 +52,8 @@ struct macro_arguments
 };
 
 static void expand_macro (symbol *);
-static void expand_token (struct obstack *, token_type, token_data *, int);
+static bool expand_token (struct obstack *, token_type, token_data *, int,
+                         bool);
 
 /* Current recursion level in expand_macro ().  */
 int expansion_level = 0;
@@ -95,37 +100,64 @@ expand_input (void)
 #endif
 
   while ((t = next_token (&td, &line, NULL)) != TOKEN_EOF)
-    expand_token ((struct obstack *) NULL, t, &td, line);
+    expand_token ((struct obstack *) NULL, t, &td, line, true);
 
   obstack_free (&argc_stack, NULL);
   obstack_free (&argv_stack, NULL);
 }
 
 
-/*------------------------------------------------------------------------.
-| Expand one token, according to its type.  Potential macro names        |
-| (TOKEN_WORD) are looked up in the symbol table, to see if they have a        
  |
-| macro definition.  If they have, they are expanded as macros, otherwise |
-| the text are just copied to the output.                                |
-`------------------------------------------------------------------------*/
+/*-------------------------------------------------------------------.
+| Expand one token TD onto the stack OBS, according to its type T,   |
+| which began parsing on the specified LINE.  If OBS is NULL, output |
+| the data.  If FIRST, there is no previous text in the current             |
+| argument.  Potential macro names (TOKEN_WORD) are looked up in the |
+| symbol table, to see if they have a macro definition.  If they     |
+| have, they are expanded as macros, otherwise the text is just             |
+| copied to the output.  Return true if the result is guaranteed to  |
+| give the same parse on rescan in a quoted context, provided       |
+| quoting doesn't change.  Returning false is always safe, although  |
+| it may lead to slower performance.                                |
+`-------------------------------------------------------------------*/
 
-static void
-expand_token (struct obstack *obs, token_type t, token_data *td, int line)
+static bool
+expand_token (struct obstack *obs, token_type t, token_data *td, int line,
+             bool first)
 {
   symbol *sym;
+  bool result;
+  int ch;
 
   switch (t)
     {                          /* TOKSW */
     case TOKEN_EOF:
     case TOKEN_MACDEF:
+      /* Always safe, since there is no text to rescan.  */
+      return true;
+
+    case TOKEN_STRING:
+      /* Tokens and comments are safe in isolation (since quote_age()
+        detects any change in delimiters).  But if other text is
+        already present, multi-character delimiters could be an
+        issue, so use a conservative heuristic.  */
+      result = first || safe_quotes ();
       break;
 
     case TOKEN_OPEN:
     case TOKEN_COMMA:
     case TOKEN_CLOSE:
+      /* Conservative heuristic; thanks to multi-character delimiter
+        concatenation.  */
+      result = safe_quotes ();
+      break;
+
     case TOKEN_SIMPLE:
-    case TOKEN_STRING:
-      shipout_text (obs, TOKEN_DATA_TEXT (td), TOKEN_DATA_LEN (td), line);
+      /* Conservative heuristic; if these characters are whitespace or
+        numeric, then behavior of safe_quotes is applicable.
+        Otherwise, assume these characters have a high likelihood of
+        use in quote delimiters.  */
+      ch = to_uchar (*TOKEN_DATA_TEXT (td));
+      result = (isspace (ch) || isdigit (ch)) && safe_quotes ();
       break;
 
     case TOKEN_WORD:
@@ -141,15 +173,22 @@ expand_token (struct obstack *obs, token_type t, 
token_data *td, int line)
 #else
          shipout_text (obs, TOKEN_DATA_TEXT (td), TOKEN_DATA_LEN (td), line);
 #endif /* !ENABLE_CHANGEWORD */
+         /* The word just appended is unquoted, but the heuristics of
+            safe_quote are applicable.  */
+         return safe_quotes();
        }
-      else
-       expand_macro (sym);
-      break;
+      expand_macro (sym);
+      /* Expanding a macro creates new tokens to scan, and those new
+        tokens may append unsafe text later; but we did not append
+        any text now.  */
+      return true;
 
     default:
       assert (!"expand_token");
       abort ();
     }
+  shipout_text (obs, TOKEN_DATA_TEXT (td), TOKEN_DATA_LEN (td), line);
+  return result;
 }
 
 
@@ -184,6 +223,8 @@ expand_argument (struct obstack *obs, token_data *argp, 
const char *caller)
   int paren_level;
   const char *file = current_file;
   int line = current_line;
+  unsigned int age = quote_age ();
+  bool first = true;
 
   TOKEN_DATA_TYPE (argp) = TOKEN_VOID;
 
@@ -211,10 +252,11 @@ expand_argument (struct obstack *obs, token_data *argp, 
const char *caller)
                    return t == TOKEN_COMMA;
                  warn_builtin_concat (caller, TOKEN_DATA_FUNC (argp));
                }
+             TOKEN_DATA_TYPE (argp) = TOKEN_TEXT;
              TOKEN_DATA_LEN (argp) = obstack_object_size (obs);
              obstack_1grow (obs, '\0');
-             TOKEN_DATA_TYPE (argp) = TOKEN_TEXT;
              TOKEN_DATA_TEXT (argp) = (char *) obstack_finish (obs);
+             TOKEN_DATA_QUOTE_AGE (argp) = age;
              return t == TOKEN_COMMA;
            }
          /* fallthru */
@@ -224,11 +266,12 @@ expand_argument (struct obstack *obs, token_data *argp, 
const char *caller)
            paren_level++;
          else if (t == TOKEN_CLOSE)
            paren_level--;
-         expand_token (obs, t, &td, line);
+         if (!expand_token (obs, t, &td, line, first))
+           age = 0;
          break;
 
        case TOKEN_EOF:
-         /* current_file changed to "" if we see TOKEN_EOF, use the
+         /* Current_file changed to "" if we see TOKEN_EOF, use the
             previous value we stored earlier.  */
          m4_error_at_line (EXIT_FAILURE, 0, file, line, caller,
                            _("end of file in argument list"));
@@ -236,7 +279,8 @@ expand_argument (struct obstack *obs, token_data *argp, 
const char *caller)
 
        case TOKEN_WORD:
        case TOKEN_STRING:
-         expand_token (obs, t, &td, line);
+         if (!expand_token (obs, t, &td, line, first))
+           age = 0;
          break;
 
        case TOKEN_MACDEF:
@@ -260,6 +304,8 @@ expand_argument (struct obstack *obs, token_data *argp, 
const char *caller)
          abort ();
        }
 
+      if (TOKEN_DATA_TYPE (argp) != TOKEN_VOID || obstack_object_size (obs))
+       first = false;
       t = next_token (&td, NULL, caller);
     }
 }
@@ -285,6 +331,7 @@ collect_arguments (symbol *sym, struct obstack *arguments)
   args.has_ref = false;
   args.argv0 = SYMBOL_NAME (sym);
   args.argv0_len = strlen (args.argv0);
+  args.quote_age = quote_age ();
   args.arraylen = 0;
   obstack_grow (&argv_stack, &args, offsetof (macro_arguments, array));
 
@@ -303,24 +350,31 @@ collect_arguments (symbol *sym, struct obstack *arguments)
          obstack_ptr_grow (&argv_stack, tdp);
          args.arraylen++;
          args.argc++;
+         /* Be conservative - any change in quoting while collecting
+            arguments, or any argument that consists of unsafe text,
+            will require a rescan if $@ is reused.  */
+         if (TOKEN_DATA_TYPE (tdp) == TOKEN_TEXT
+             && TOKEN_DATA_LEN (tdp) > 0
+             && TOKEN_DATA_QUOTE_AGE (tdp) != args.quote_age)
+           args.quote_age = 0;
        }
       while (more_args);
     }
   argv = (macro_arguments *) obstack_finish (&argv_stack);
   argv->argc = args.argc;
+  if (args.quote_age != quote_age ())
+    argv->quote_age = 0;
   argv->arraylen = args.arraylen;
   return argv;
 }
 
 
-/*------------------------------------------------------------------------.
-| The actual call of a macro is handled by call_macro ().  call_macro ()  |
-| is passed a symbol SYM, whose type is used to call either a builtin    |
-| function, or the user macro expansion function expand_user_macro ()    |
-| (lives in builtin.c).  There are ARGC arguments to the call, stored in  |
-| the ARGV table.  The expansion is left on the obstack EXPANSION.  Macro |
-| tracing is also handled here.                                                
  |
-`------------------------------------------------------------------------*/
+/*-----------------------------------------------------------------.
+| Call the macro SYM, which is either a builtin function or a user |
+| macro (via the expansion function expand_user_macro () in        |
+| builtin.c).  There are ARGC arguments to the call, stored in the |
+| ARGV table.  The expansion is left on the obstack EXPANSION.     |
+`-----------------------------------------------------------------*/
 
 void
 call_macro (symbol *sym, int argc, macro_arguments *argv,
@@ -434,6 +488,7 @@ expand_macro (symbol *sym)
     obstack_free (&argv_stack, argv);
 }
 
+
 /* Given ARGV, return the token_data that contains argument INDEX;
    INDEX must be > 0, < argv->argc.  */
 static token_data *
@@ -470,7 +525,6 @@ arg_token (macro_arguments *argv, unsigned int index)
   return token;
 }
 
-
 /* Given ARGV, return how many arguments it refers to.  */
 unsigned int
 arg_argc (macro_arguments *argv)
@@ -494,7 +548,7 @@ arg_type (macro_arguments *argv, unsigned int index)
   return type;
 }
 
-/* Given ARGV, return the text at argument INDEX, or NULL if the
+/* Given ARGV, return the text at argument INDEX.  Abort if the
    argument is not text.  Index 0 is always text, and indices beyond
    argc return the empty string.  */
 const char *
@@ -511,8 +565,6 @@ arg_text (macro_arguments *argv, unsigned int index)
     {
     case TOKEN_TEXT:
       return TOKEN_DATA_TEXT (token);
-    case TOKEN_FUNC:
-      return NULL;
     case TOKEN_COMP:
       /* TODO - how to concatenate multiple arguments?  For now, we expect
         only one element in the chain, and arg_token dereferences it.  */
@@ -555,7 +607,7 @@ arg_empty (macro_arguments *argv, unsigned int index)
   return arg_token (argv, index) == &empty_token;
 }
 
-/* Given ARGV, return the length of argument INDEX, or SIZE_MAX if the
+/* Given ARGV, return the length of argument INDEX.  Abort if the
    argument is not text.  Indices beyond argc return 0.  */
 size_t
 arg_len (macro_arguments *argv, unsigned int index)
@@ -572,8 +624,6 @@ arg_len (macro_arguments *argv, unsigned int index)
     case TOKEN_TEXT:
       assert ((token == &empty_token) == (TOKEN_DATA_LEN (token) == 0));
       return TOKEN_DATA_LEN (token);
-    case TOKEN_FUNC:
-      return SIZE_MAX;
     case TOKEN_COMP:
       /* TODO - how to concatenate multiple arguments?  For now, we expect
         only one element in the chain, and arg_token dereferences it.  */
@@ -585,30 +635,15 @@ arg_len (macro_arguments *argv, unsigned int index)
 }
 
 /* Given ARGV, return the builtin function referenced by argument
-   INDEX, or NULL if it is not a builtin.  Index 0, and indices beyond
-   argc, return NULL.  */
+   INDEX.  Abort if it is not a builtin in isolation.  */
 builtin_func *
 arg_func (macro_arguments *argv, unsigned int index)
 {
   token_data *token;
 
-  if (index == 0 || index >= argv->argc)
-    return NULL;
   token = arg_token (argv, index);
-  switch (TOKEN_DATA_TYPE (token))
-    {
-    case TOKEN_FUNC:
-      return TOKEN_DATA_FUNC (token);
-    case TOKEN_TEXT:
-      return NULL;
-    case TOKEN_COMP:
-      /* TODO - how to concatenate multiple arguments?  For now, we expect
-        only one element in the chain.  */
-    default:
-      break;
-    }
-  assert(!"arg_func");
-  abort ();
+  assert (TOKEN_DATA_TYPE (token) == TOKEN_FUNC);
+  return TOKEN_DATA_FUNC (token);
 }
 
 /* Create a new argument object using the same obstack as ARGV; thus,
@@ -671,5 +706,6 @@ make_argv_ref (macro_arguments *argv, const char *argv0, 
size_t argv0_len,
   new_argv->inuse = false;
   new_argv->argv0 = argv0;
   new_argv->argv0_len = argv0_len;
+  new_argv->quote_age = argv->quote_age;
   return new_argv;
 }
-- 
1.5.3.5


reply via email to

[Prev in Thread] Current Thread [Next in Thread]