m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[11/18] argv_ref speedup: support composite arguments


From: Eric Blake
Subject: [11/18] argv_ref speedup: support composite arguments
Date: Tue, 22 Jan 2008 13:59:57 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071031 Thunderbird/2.0.0.9 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Next in the series.  Up till now, every byte of rescanned input has been
copied, so the argument collection engine could deal with contiguous text.
~ But with this patch, the argument collection engine has been taught how
to create composite tokens, where links in the token chain can come from
back-references in the input engine.  Basically, the input engine has a
new placeholder (CHAR_QUOTE), similar to the placeholder for builtins,
which represents a series of rescanned bytes that came from the same
quoting rules.  Meanwhile, all of the argv accessor methods will flatten
text from a composite token on an as-needed basis, rather than wasting
effort on flattening it up front when the argument is not used.  As a
result, the amount of memory usage drops (dramatically on boxed recursion,
but even real-life autoconf and unboxed recursion test cases see some
benefits).  More importantly, with less copying, m4 operates much faster
when rescanning back-references.  This patch still flattens composite
arguments into contiguous text in push_arg (ie. no references to a
reference yet), and still handles argument lists one argument at a time,
so the speedup is all in a better coefficient and not due to any
complexity reduction.

2008-01-22  Eric Blake  <address@hidden>

        Stage 11: full circle for single argument references.
~        * src/m4.h (struct token_chain): Add quote_age member.
~        (struct token_data): Add end member to chain alternate.
~        (make_text_link): New prototype.
~        * src/input.c (CHAR_QUOTE): New macro.
~        (word_start): Pre-allocate.
~        (set_word_regexp): Simplify.
~        (make_text_link): Export, and handle new fields.
~        (next_char, next_char_1): Add parameter.
~        (append_quote_token): New function.
~        (match_input, next_token): Adjust callers to handle quoted input
~        blocks.
~        * src/macro.c (struct macro_arguments): Add wrapper member.
~        (expand_argument): Accept composite blocks from input engine.
~        (expand_macro): Reduce refcounts of composite arguments.
~        (collect_arguments, arg_token, arg_mark, make_argv_ref): Update to
~        use new fields.
~        (arg_type, arg_text, arg_equal, arg_len): Treat composite
~        arguments as text.
~        (push_arg, push_args): Handle composites.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHlllN84KuGfSFAYARAn/zAJ4g1+FB+zY+1Wh/N3zyI6RxQBjrKgCffgKm
te0swNG/6ja6EH1Y5kxSdoo=
=VdmM
-----END PGP SIGNATURE-----
>From 5307d448bacdf7f588a95f7bc44c520ce80827a6 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Mon, 21 Jan 2008 12:04:45 -0700
Subject: [PATCH] Stage 11: full circle for single argument references.

Pass quoted strings through to argument collection in a single
action, so that an argument can be reused throughout macro
recursion if it remains unchanged.
Memory impact: noticeable improvement, due to more reuse in
argument collection stacks.
Speed impact: noticeable improvement, due to less copying.
* m4/m4module.h (m4_arg_text): Add parameter.
(M4ARG): Adjust.
* m4/m4private.h (CHAR_QUOTE): New input engine sentinel.
(m4__make_text_link): New prototype.
(struct m4_symbol_chain): Add quote_age member.
(struct m4_symbol_value): Add end member to chained symbol.
(struct m4_macro_args): Add wrapper member.
* m4/symtab.c (m4_symbol_value_print): Print composite tokens.
(m4_symbol_value_copy, m4_symbol_value_delete): Recognize
composite tokens.
* m4/input.c (make_text_link): Rename...
(m4__make_text_link): ...to this, and export.
(m4_push_string_finish): Adjust caller.
(make_text_link, m4__push_symbol): Update new field.
(file_read, builtin_read, string_read, composite_read, next_char):
Add parameter.
(m4_skip_line, match_input, consume_syntax): Adjust callers.
(append_quote_token): New function.
(m4__next_token): Pass quoted strings onto argument collection.
(m4_print_token) [DEBUG_INPUT]: Update.
* m4/macro.c (expand_argument): Collect composite arguments.
(collect_arguments): Update new field.
(expand_macro): Reduce ref-count of back-references after use.
(arg_mark, m4_arg_symbol, m4_make_argv_ref): Adjust to new member
names.
(m4_is_arg_text): Also recognize composite symbols as text.
(m4_arg_text, m4_arg_len): Merge composite symbols as needed.
(m4_arg_equal): Compare composite symbols.
(m4_push_arg, m4_push_args): Handle composite symbols.
(m4_arg_symbol): Relax assertion.
(process_macro): Use single-argument references.
* m4/output.c (m4_shipout_string_trunc): Update comment.
* tests/macros.at (Rescanning macros): Augment test.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog       |   43 ++++++++++
 m4/input.c      |  236 +++++++++++++++++++++++++++++++++++-------------------
 m4/m4module.h   |    9 +-
 m4/m4private.h  |   17 +++-
 m4/macro.c      |  239 +++++++++++++++++++++++++++++++++++++++++++++++--------
 m4/output.c     |    3 +-
 m4/symtab.c     |  150 ++++++++++++++++++++++++++---------
 tests/macros.at |   20 +++++-
 8 files changed, 557 insertions(+), 160 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index cc00596..782b475 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,46 @@
+2008-01-21  Eric Blake  <address@hidden>
+
+       Stage 11: full circle for single argument references.
+       Pass quoted strings through to argument collection in a single
+       action, so that an argument can be reused throughout macro
+       recursion if it remains unchanged.
+       Memory impact: noticeable improvement, due to more reuse in
+       argument collection stacks.
+       Speed impact: noticeable improvement, due to less copying.
+       * m4/m4module.h (m4_arg_text): Add parameter.
+       (M4ARG): Adjust.
+       * m4/m4private.h (CHAR_QUOTE): New input engine sentinel.
+       (m4__make_text_link): New prototype.
+       (struct m4_symbol_chain): Add quote_age member.
+       (struct m4_symbol_value): Add end member to chained symbol.
+       (struct m4_macro_args): Add wrapper member.
+       * m4/symtab.c (m4_symbol_value_print): Print composite tokens.
+       (m4_symbol_value_copy, m4_symbol_value_delete): Recognize
+       composite tokens.
+       * m4/input.c (make_text_link): Rename...
+       (m4__make_text_link): ...to this, and export.
+       (m4_push_string_finish): Adjust caller.
+       (make_text_link, m4__push_symbol): Update new field.
+       (file_read, builtin_read, string_read, composite_read, next_char):
+       Add parameter.
+       (m4_skip_line, match_input, consume_syntax): Adjust callers.
+       (append_quote_token): New function.
+       (m4__next_token): Pass quoted strings onto argument collection.
+       (m4_print_token) [DEBUG_INPUT]: Update.
+       * m4/macro.c (expand_argument): Collect composite arguments.
+       (collect_arguments): Update new field.
+       (expand_macro): Reduce ref-count of back-references after use.
+       (arg_mark, m4_arg_symbol, m4_make_argv_ref): Adjust to new member
+       names.
+       (m4_is_arg_text): Also recognize composite symbols as text.
+       (m4_arg_text, m4_arg_len): Merge composite symbols as needed.
+       (m4_arg_equal): Compare composite symbols.
+       (m4_push_arg, m4_push_args): Handle composite symbols.
+       (m4_arg_symbol): Relax assertion.
+       (process_macro): Use single-argument references.
+       * m4/output.c (m4_shipout_string_trunc): Update comment.
+       * tests/macros.at (Rescanning macros): Augment test.
+
 2008-01-16  Eric Blake  <address@hidden>
 
        Stage 10: avoid extra copying of strings and comments.
diff --git a/m4/input.c b/m4/input.c
index 6dcaac0..0dcb0ae 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -93,29 +93,28 @@
    between input blocks must update the context accordingly.  */
 
 static int     file_peek               (m4_input_block *);
-static int     file_read               (m4_input_block *, m4 *, bool);
+static int     file_read               (m4_input_block *, m4 *, bool, bool);
 static void    file_unget              (m4_input_block *, int);
 static bool    file_clean              (m4_input_block *, m4 *, bool);
 static void    file_print              (m4_input_block *, m4 *, m4_obstack *);
 static int     builtin_peek            (m4_input_block *);
-static int     builtin_read            (m4_input_block *, m4 *, bool);
+static int     builtin_read            (m4_input_block *, m4 *, bool, bool);
 static void    builtin_unget           (m4_input_block *, int);
 static void    builtin_print           (m4_input_block *, m4 *, m4_obstack *);
 static int     string_peek             (m4_input_block *);
-static int     string_read             (m4_input_block *, m4 *, bool);
+static int     string_read             (m4_input_block *, m4 *, bool, bool);
 static void    string_unget            (m4_input_block *, int);
 static void    string_print            (m4_input_block *, m4 *, m4_obstack *);
 static int     composite_peek          (m4_input_block *);
-static int     composite_read          (m4_input_block *, m4 *, bool);
+static int     composite_read          (m4_input_block *, m4 *, bool, bool);
 static void    composite_unget         (m4_input_block *, int);
 static bool    composite_clean         (m4_input_block *, m4 *, bool);
 static void    composite_print         (m4_input_block *, m4 *, m4_obstack *);
 
-static void    make_text_link          (m4_obstack *, m4_symbol_chain **,
-                                        m4_symbol_chain **);
 static void    init_builtin_token      (m4 *, m4_symbol_value *);
+static void    append_quote_token      (m4_obstack *, m4_symbol_value *);
 static bool    match_input             (m4 *, const char *, bool);
-static int     next_char               (m4 *, bool);
+static int     next_char               (m4 *, bool, bool);
 static int     peek_char               (m4 *);
 static bool    pop_input               (m4 *, bool);
 static void    unget_input             (int);
@@ -133,9 +132,10 @@ struct input_funcs
   int  (*peek_func)    (m4_input_block *);
 
   /* Read input, return an unsigned char, CHAR_BUILTIN if it is a
-     builtin, or CHAR_RETRY if none available.  If SAFE, then do not
-     alter the current file or line.  */
-  int  (*read_func)    (m4_input_block *, m4 *, bool safe);
+     builtin, or CHAR_RETRY if none available.  If ALLOW_QUOTE, then
+     CHAR_QUOTE may be returned.  If SAFE, then do not alter the
+     current file or line.  */
+  int  (*read_func)    (m4_input_block *, m4 *, bool allow_quote, bool safe);
 
   /* Unread a single unsigned character or CHAR_BUILTIN, must be the
      same character previously read by read_func.  */
@@ -269,7 +269,8 @@ file_peek (m4_input_block *me)
 }
 
 static int
-file_read (m4_input_block *me, m4 *context, bool safe M4_GNUC_UNUSED)
+file_read (m4_input_block *me, m4 *context, bool allow_quote M4_GNUC_UNUSED,
+          bool safe M4_GNUC_UNUSED)
 {
   int ch;
 
@@ -397,7 +398,7 @@ builtin_peek (m4_input_block *me)
 
 static int
 builtin_read (m4_input_block *me, m4 *context M4_GNUC_UNUSED,
-             bool safe M4_GNUC_UNUSED)
+             bool allow_quote M4_GNUC_UNUSED, bool safe M4_GNUC_UNUSED)
 {
   if (me->u.u_b.read)
     return CHAR_RETRY;
@@ -479,7 +480,7 @@ string_peek (m4_input_block *me)
 
 static int
 string_read (m4_input_block *me, m4 *context M4_GNUC_UNUSED,
-            bool safe M4_GNUC_UNUSED)
+            bool allow_quote M4_GNUC_UNUSED, bool safe M4_GNUC_UNUSED)
 {
   if (!me->u.u_s.len)
     return CHAR_RETRY;
@@ -560,7 +561,7 @@ m4__push_symbol (m4 *context, m4_symbol_value *value, 
size_t level)
       next->funcs = &composite_funcs;
       next->u.u_c.chain = next->u.u_c.end = NULL;
     }
-  make_text_link (current_input, &next->u.u_c.chain, &next->u.u_c.end);
+  m4__make_text_link (current_input, &next->u.u_c.chain, &next->u.u_c.end);
   chain = (m4_symbol_chain *) obstack_alloc (current_input, sizeof *chain);
   if (next->u.u_c.end)
     next->u.u_c.end->next = chain;
@@ -568,6 +569,7 @@ m4__push_symbol (m4 *context, m4_symbol_value *value, 
size_t level)
     next->u.u_c.chain = chain;
   next->u.u_c.end = chain;
   chain->next = NULL;
+  chain->quote_age = m4_get_symbol_value_quote_age (value);
   chain->str = m4_get_symbol_value_text (value);
   chain->len = m4_get_symbol_value_len (value);
   chain->level = level;
@@ -611,7 +613,8 @@ m4_push_string_finish (void)
          next->u.u_s.len = len;
        }
       else
-       make_text_link (current_input, &next->u.u_c.chain, &next->u.u_c.end);
+       m4__make_text_link (current_input, &next->u.u_c.chain,
+                           &next->u.u_c.end);
       next->prev = isp;
       ret = isp = next;
       input_change = true;
@@ -649,15 +652,19 @@ composite_peek (m4_input_block *me)
 }
 
 static int
-composite_read (m4_input_block *me, m4 *context, bool safe)
+composite_read (m4_input_block *me, m4 *context, bool allow_quote, bool safe)
 {
   m4_symbol_chain *chain = me->u.u_c.chain;
   while (chain)
     {
+      if (allow_quote && chain->quote_age == m4__quote_age (M4SYNTAX))
+       return CHAR_QUOTE;
       if (chain->str)
        {
          if (chain->len)
            {
+             /* Partial consumption invalidates quote age.  */
+             chain->quote_age = 0;
              chain->len--;
              return to_uchar (*chain->str++);
            }
@@ -668,8 +675,6 @@ composite_read (m4_input_block *me, m4 *context, bool safe)
          assert (!"implemented yet");
          abort ();
        }
-      if (safe)
-       return CHAR_RETRY;
       if (chain->level < SIZE_MAX)
        m4__adjust_refcount (context, chain->level, false);
       me->u.u_c.chain = chain = chain->next;
@@ -744,9 +749,9 @@ composite_print (m4_input_block *me, m4 *context, 
m4_obstack *obs)
 /* Given an obstack OBS, capture any unfinished text as a link in the
    chain that starts at *START and ends at *END.  START may be NULL if
    *END is non-NULL.  */
-static void
-make_text_link (m4_obstack *obs, m4_symbol_chain **start,
-               m4_symbol_chain **end)
+void
+m4__make_text_link (m4_obstack *obs, m4_symbol_chain **start,
+                   m4_symbol_chain **end)
 {
   m4_symbol_chain *chain;
   size_t len = obstack_object_size (obs);
@@ -762,6 +767,7 @@ make_text_link (m4_obstack *obs, m4_symbol_chain **start,
        *start = chain;
       *end = chain;
       chain->next = NULL;
+      chain->quote_age = 0;
       chain->str = str;
       chain->len = len;
       chain->level = SIZE_MAX;
@@ -905,13 +911,43 @@ init_builtin_token (m4 *context, m4_symbol_value *token)
   VALUE_MAX_ARGS (token)       = block->u.u_b.builtin->max_args;
 }
 
+/* When a QUOTE token is seen, convert VALUE to a composite (if it is
+   not one already), consisting of any unfinished text on OBS, as well
+   as the quoted token from the top of the input stack.  Use OBS for
+   any additional allocations needed to store the token chain.  */
+static void
+append_quote_token (m4_obstack *obs, m4_symbol_value *value)
+{
+  m4_symbol_chain *src_chain = isp->u.u_c.chain;
+  m4_symbol_chain *chain;
+  assert (isp->funcs == &composite_funcs && obs);
+
+  if (value->type == M4_SYMBOL_VOID)
+    {
+      value->type = M4_SYMBOL_COMP;
+      value->u.u_c.chain = value->u.u_c.end = NULL;
+    }
+  assert (value->type == M4_SYMBOL_COMP);
+  m4__make_text_link (obs, &value->u.u_c.chain, &value->u.u_c.end);
+  chain = (m4_symbol_chain *) obstack_copy (obs, src_chain, sizeof *chain);
+  if (value->u.u_c.end)
+    value->u.u_c.end->next = chain;
+  else
+    value->u.u_c.chain = chain;
+  value->u.u_c.end = chain;
+  value->u.u_c.end->next = NULL;
+  isp->u.u_c.chain = src_chain->next;
+}
+
 
 /* Low level input is done a character at a time.  The function
    next_char () is used to read and advance the input to the next
-   character.  If RETRY, then avoid returning CHAR_RETRY by popping
-   input.  */
+   character.  If ALLOW_QUOTE, and the current input matches the
+   current quote age, return CHAR_QUOTE and leave consumption of data
+   for append_quote_token.  If RETRY, then avoid returning CHAR_RETRY
+   by popping input.  */
 static int
-next_char (m4 *context, bool retry)
+next_char (m4 *context, bool allow_quote, bool retry)
 {
   int ch;
 
@@ -931,7 +967,8 @@ next_char (m4 *context, bool retry)
        }
 
       assert (isp->funcs->read_func);
-      while ((ch = isp->funcs->read_func (isp, context, !retry)) != CHAR_RETRY
+      while (((ch = isp->funcs->read_func (isp, context, allow_quote, !retry))
+             != CHAR_RETRY)
             || !retry)
        {
          /* if (!IS_IGNORE (ch)) */
@@ -960,7 +997,9 @@ peek_char (m4 *context)
       assert (block->funcs->peek_func);
       if ((ch = block->funcs->peek_func (block)) != CHAR_RETRY)
        {
-         return /* (IS_IGNORE (ch)) ? next_char (context, true) : */ ch;
+/*       if (IS_IGNORE (ch)) */
+/*         return next_char (context, false, true); */
+         return ch;
        }
 
       block = block->prev;
@@ -969,7 +1008,7 @@ peek_char (m4 *context)
 
 /* The function unget_input () puts back a character on the input
    stack, using an existing input_block if possible.  This is not safe
-   to call except immediately after next_char(context, false).  */
+   to call except immediately after next_char(context, allow, false).  */
 static void
 unget_input (int ch)
 {
@@ -987,7 +1026,7 @@ m4_skip_line (m4 *context, const char *name)
   const char *file = m4_get_current_file (context);
   int line = m4_get_current_line (context);
 
-  while ((ch = next_char (context, true)) != CHAR_EOF && ch != '\n')
+  while ((ch = next_char (context, false, true)) != CHAR_EOF && ch != '\n')
     ;
   if (ch == CHAR_EOF)
     /* current_file changed; use the previous value we cached.  */
@@ -1032,14 +1071,14 @@ match_input (m4 *context, const char *s, bool consume)
   if (s[1] == '\0')
     {
       if (consume)
-       next_char (context, true);
+       next_char (context, false, true);
       return true;                     /* short match */
     }
 
-  next_char (context, true);
+  next_char (context, false, true);
   for (n = 1, t = s++; (ch = peek_char (context)) == to_uchar (*s++); )
     {
-      next_char (context, true);
+      next_char (context, false, true);
       n++;
       if (*s == '\0')          /* long match */
        {
@@ -1071,29 +1110,35 @@ match_input (m4 *context, const char *s, bool consume)
 
 /* While the current input character has the given SYNTAX, append it
    to OBS.  Take care not to pop input source unless the next source
-   would continue the chain.  Return true unless the chain ended with
+   would continue the chain.  Return true if the chain ended with
    CHAR_EOF.  */
 static bool
 consume_syntax (m4 *context, m4_obstack *obs, unsigned int syntax)
 {
   int ch;
+  bool allow_quote = m4__safe_quotes (M4SYNTAX);
   assert (syntax);
   while (1)
     {
       /* It is safe to call next_char without first checking
         peek_char, except at input source boundaries, which we detect
-        by CHAR_RETRY.  We exploit the fact that CHAR_EOF and
-        CHAR_MACRO do not satisfy any syntax categories.  */
-      while ((ch = next_char (context, false)) != CHAR_RETRY
+        by CHAR_RETRY.  We exploit the fact that CHAR_EOF,
+        CHAR_BUILTIN, and CHAR_QUOTE do not satisfy any syntax
+        categories.  */
+      while ((ch = next_char (context, allow_quote, false)) != CHAR_RETRY
             && m4_has_syntax (M4SYNTAX, ch, syntax))
-       obstack_1grow (obs, ch);
-      if (ch == CHAR_RETRY)
+       {
+         assert (ch < CHAR_EOF);
+         obstack_1grow (obs, ch);
+       }
+      if (ch == CHAR_RETRY || ch == CHAR_QUOTE)
        {
          ch = peek_char (context);
          if (m4_has_syntax (M4SYNTAX, ch, syntax))
            {
+             assert (ch < CHAR_EOF);
              obstack_1grow (obs, ch);
-             next_char (context, true);
+             next_char (context, false, true);
              continue;
            }
          return ch == CHAR_EOF;
@@ -1141,13 +1186,13 @@ m4_input_exit (void)
 }
 
 
-/* Parse and return a single token from the input stream, built in
-   TOKEN.  See m4__token_type for the valid return types, along with a
-   description of what TOKEN will contain.  If LINE is not NULL, set
-   *LINE to the line number where the token starts.  If OBS, expand
-   safe tokens (strings and comments) directly into OBS rather than in
-   a temporary staging area.  Report errors (unterminated comments or
-   strings) on behalf of CALLER, if non-NULL.
+/* Parse and return a single token from the input stream, constructed
+   into TOKEN.  See m4__token_type for the valid return types, along
+   with a description of what TOKEN will contain.  If LINE is not
+   NULL, set *LINE to the line number where the token starts.  If OBS,
+   expand safe tokens (strings and comments) directly into OBS rather
+   than in a temporary staging area.  Report errors (unterminated
+   comments or strings) on behalf of CALLER, if non-NULL.
 
    If OBS is NULL or the token expansion is unknown, the token text is
    collected on the obstack token_stack, which never contains more
@@ -1177,7 +1222,6 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
   do {
     obstack_free (&token_stack, token_bottom);
 
-
     /* Must consume an input character, but not until CHAR_BUILTIN is
        handled.  */
     ch = peek_char (context);
@@ -1186,28 +1230,29 @@ m4__next_token (m4 *context, m4_symbol_value *token, 
int *line,
 #ifdef DEBUG_INPUT
        xfprintf (stderr, "next_token -> EOF\n");
 #endif
-       next_char (context, true);
+       next_char (context, false, true);
        return M4_TOKEN_EOF;
       }
 
     if (ch == CHAR_BUILTIN)            /* BUILTIN TOKEN */
       {
        init_builtin_token (context, token);
-       next_char (context, true);
+       next_char (context, false, true);
 #ifdef DEBUG_INPUT
        m4_print_token ("next_token", M4_TOKEN_MACDEF, token);
 #endif
        return M4_TOKEN_MACDEF;
       }
 
-    next_char (context, true); /* Consume character we already peeked at.  */
+    /* Consume character we already peeked at.  */
+    next_char (context, false, true);
     file = m4_get_current_file (context);
     *line = m4_get_current_line (context);
 
     if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_ESCAPE))
       {                                        /* ESCAPED WORD */
        obstack_1grow (&token_stack, ch);
-       if ((ch = next_char (context, true)) != CHAR_EOF)
+       if ((ch = next_char (context, false, true)) < CHAR_EOF)
          {
            obstack_1grow (&token_stack, ch);
            if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_ALPHA))
@@ -1234,12 +1279,13 @@ m4__next_token (m4 *context, m4_symbol_value *token, 
int *line,
        quote_level = 1;
        while (1)
          {
-           ch = next_char (context, true);
+           ch = next_char (context, obs && m4__quote_age (M4SYNTAX), true);
            if (ch == CHAR_EOF)
              m4_error_at_line (context, EXIT_FAILURE, 0, file, *line, caller,
                                _("end of file in string"));
-
-           if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_RQUOTE))
+           if (ch == CHAR_QUOTE)
+             append_quote_token (obs, token);
+           else if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_RQUOTE))
              {
                if (--quote_level == 0)
                  break;
@@ -1261,9 +1307,10 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
        if (obs)
          obs_safe = obs;
        quote_level = 1;
+       assert (!m4__quote_age (M4SYNTAX));
        while (1)
          {
-           ch = next_char (context, true);
+           ch = next_char (context, false, true);
            if (ch == CHAR_EOF)
              m4_error_at_line (context, EXIT_FAILURE, 0, file, *line, caller,
                                _("end of file in string"));
@@ -1290,11 +1337,14 @@ m4__next_token (m4 *context, m4_symbol_value *token, 
int *line,
        if (obs && !m4_get_discard_comments_opt (context))
          obs_safe = obs;
        obstack_1grow (obs_safe, ch);
-       while ((ch = next_char (context, true)) != CHAR_EOF
+       while ((ch = next_char (context, false, true)) < CHAR_EOF
               && !m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_ECOMM))
          obstack_1grow (obs_safe, ch);
        if (ch != CHAR_EOF)
-         obstack_1grow (obs_safe, ch);
+         {
+           assert (ch < CHAR_EOF);
+           obstack_1grow (obs_safe, ch);
+         }
        else
          m4_error_at_line (context, EXIT_FAILURE, 0, file, *line, caller,
                            _("end of file in comment"));
@@ -1308,12 +1358,15 @@ m4__next_token (m4 *context, m4_symbol_value *token, 
int *line,
          obs_safe = obs;
        obstack_grow (obs_safe, context->syntax->bcomm.string,
                      context->syntax->bcomm.length);
-       while ((ch = next_char (context, true)) != CHAR_EOF
+       while ((ch = next_char (context, false, true)) < CHAR_EOF
               && !MATCH (context, ch, context->syntax->ecomm.string, true))
          obstack_1grow (obs_safe, ch);
        if (ch != CHAR_EOF)
-         obstack_grow (obs_safe, context->syntax->ecomm.string,
-                       context->syntax->ecomm.length);
+         {
+           assert (ch < CHAR_EOF);
+           obstack_grow (obs_safe, context->syntax->ecomm.string,
+                         context->syntax->ecomm.length);
+         }
        else
          m4_error_at_line (context, EXIT_FAILURE, 0, file, *line, caller,
                            _("end of file in comment"));
@@ -1343,6 +1396,7 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
     else if (m4_is_syntax_single_quotes (M4SYNTAX)
             && m4_is_syntax_single_comments (M4SYNTAX))
       {                        /* EVERYTHING ELSE (SHORT QUOTES AND COMMENTS) 
*/
+       assert (ch < CHAR_EOF);
        obstack_1grow (&token_stack, ch);
 
        if (m4_has_syntax (M4SYNTAX, ch,
@@ -1374,6 +1428,7 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
       }
     else               /* EVERYTHING ELSE (LONG QUOTES OR COMMENTS) */
       {
+       assert (ch < CHAR_EOF);
        obstack_1grow (&token_stack, ch);
 
        if (m4_has_syntax (M4SYNTAX, ch,
@@ -1394,16 +1449,21 @@ m4__next_token (m4 *context, m4_symbol_value *token, 
int *line,
       }
   } while (type == M4_TOKEN_NONE);
 
-  if (obs_safe != obs)
+  if (token->type == M4_SYMBOL_VOID)
     {
-      len = obstack_object_size (&token_stack);
-      obstack_1grow (&token_stack, '\0');
+      if (obs_safe != obs)
+       {
+         len = obstack_object_size (&token_stack);
+         obstack_1grow (&token_stack, '\0');
 
-      m4_set_symbol_value_text (token, obstack_finish (&token_stack), len,
-                               m4__quote_age (M4SYNTAX));
+         m4_set_symbol_value_text (token, obstack_finish (&token_stack), len,
+                                   m4__quote_age (M4SYNTAX));
+       }
+      else
+       assert (type == M4_TOKEN_STRING);
     }
   else
-    assert (type == M4_TOKEN_STRING);
+    assert (token->type == M4_SYMBOL_COMP && type == M4_TOKEN_STRING);
   VALUE_MAX_ARGS (token) = -1;
 
 #ifdef DEBUG_INPUT
@@ -1440,46 +1500,58 @@ m4__next_token_is_open (m4 *context)
 int
 m4_print_token (const char *s, m4__token_type type, m4_symbol_value *token)
 {
-  xfprintf (stderr, "%s: ", s ? s : "m4input");
+  m4_obstack obs;
+  size_t len;
+
+  obstack_init (&obs);
+  if (!s)
+    s = "m4input";
+  obstack_grow (&obs, s, strlen (s));
+  obstack_1grow (&obs, ':');
+  obstack_1grow (&obs, ' ');
   switch (type)
     {                          /* TOKSW */
     case M4_TOKEN_EOF:
-      xfprintf (stderr, "eof\n");
+      obstack_grow (&obs, "eof", strlen ("eof"));
+      token = NULL;
       break;
     case M4_TOKEN_NONE:
-      xfprintf (stderr, "none\n");
+      obstack_grow (&obs, "none", strlen ("none"));
+      token = NULL;
       break;
     case M4_TOKEN_STRING:
-      xfprintf (stderr, "string\t\"%s\"\n", m4_get_symbol_value_text (token));
+      obstack_grow (&obs, "string\t", strlen ("string\t"));
       break;
     case M4_TOKEN_SPACE:
-      xfprintf (stderr, "space\t\"%s\"\n", m4_get_symbol_value_text (token));
+      obstack_grow (&obs, "space\t", strlen ("space\t"));
       break;
     case M4_TOKEN_WORD:
-      xfprintf (stderr, "word\t\"%s\"\n", m4_get_symbol_value_text (token));
+      obstack_grow (&obs, "word\t", strlen ("word\t"));
       break;
     case M4_TOKEN_OPEN:
-      xfprintf (stderr, "open\t\"%s\"\n", m4_get_symbol_value_text (token));
+      obstack_grow (&obs, "open\t", strlen ("open\t"));
       break;
     case M4_TOKEN_COMMA:
-      xfprintf (stderr, "comma\t\"%s\"\n", m4_get_symbol_value_text (token));
+      obstack_grow (&obs, "comma\t", strlen ("comma\t"));
       break;
     case M4_TOKEN_CLOSE:
-      xfprintf (stderr, "close\t\"%s\"\n", m4_get_symbol_value_text (token));
+      obstack_grow (&obs, "close\t", strlen ("close\t"));
       break;
     case M4_TOKEN_SIMPLE:
-      xfprintf (stderr, "simple\t\"%s\"\n", m4_get_symbol_value_text (token));
+      obstack_grow (&obs, "simple\t", strlen ("simple\t"));
       break;
     case M4_TOKEN_MACDEF:
-      {
-       const m4_builtin *bp;
-       bp = m4_builtin_find_by_func (NULL, m4_get_symbol_value_func (token));
-       assert (bp);
-       xfprintf (stderr, "builtin\t<%s>{%s}\n", bp->name,
-                 m4_get_module_name (VALUE_MODULE (token)));
-      }
+      obstack_grow (&obs, "builtin\t", strlen ("builtin\t"));
       break;
+    default:
+      abort ();
     }
+  if (token)
+    m4_symbol_value_print (token, &obs, true, "\"", "\"", SIZE_MAX, NULL);
+  obstack_1grow (&obs, '\n');
+  len = obstack_object_size (&obs);
+  fwrite (obstack_finish (&obs), 1, len, stderr);
+  obstack_free (&obs, NULL);
   return 0;
 }
 #endif /* DEBUG_INPUT */
diff --git a/m4/m4module.h b/m4/m4module.h
index 03025af..330a90e 100644
--- a/m4/m4module.h
+++ b/m4/m4module.h
@@ -1,7 +1,7 @@
 /* GNU m4 -- A simple macro processor
 
    Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 1999, 2000, 2003,
-   2004, 2005, 2006, 2007 Free Software Foundation, Inc.
+   2004, 2005, 2006, 2007, 2008 Free Software Foundation, Inc.
 
    This file is part of GNU M4.
 
@@ -102,8 +102,9 @@ struct m4_macro
        m4_module_import (context, STR (M), STR (S), obs)
 
 /* Grab the text contents of argument I, or abort if the argument is
-   not text.  Assumes that `m4_macro_args *argv' is in scope.  */
-#define M4ARG(i) m4_arg_text (argv, i)
+   not text.  Assumes that `m4 *context' and `m4_macro_args *argv' are
+   in scope.  */
+#define M4ARG(i) m4_arg_text (context, argv, i)
 
 extern bool    m4_bad_argc        (m4 *, int, const char *,
                                    unsigned int, unsigned int, bool);
@@ -304,7 +305,7 @@ extern unsigned int m4_arg_argc             (m4_macro_args 
*);
 extern m4_symbol_value *m4_arg_symbol  (m4_macro_args *, unsigned int);
 extern bool    m4_is_arg_text          (m4_macro_args *, unsigned int);
 extern bool    m4_is_arg_func          (m4_macro_args *, unsigned int);
-extern const char *m4_arg_text         (m4_macro_args *, unsigned int);
+extern const char *m4_arg_text         (m4 *, m4_macro_args *, unsigned int);
 extern bool    m4_arg_equal            (m4_macro_args *, unsigned int,
                                         unsigned int);
 extern bool    m4_arg_empty            (m4_macro_args *, unsigned int);
diff --git a/m4/m4private.h b/m4/m4private.h
index 630a9b7..6a08455 100644
--- a/m4/m4private.h
+++ b/m4/m4private.h
@@ -35,7 +35,7 @@ typedef enum {
   M4_SYMBOL_TEXT,              /* Plain text, u.u_t is valid.  */
   M4_SYMBOL_FUNC,              /* Builtin function, u.func is valid.  */
   M4_SYMBOL_PLACEHOLDER,       /* Placeholder for unknown builtin from -R.  */
-  M4_SYMBOL_COMP               /* Composite symbol, u.chain is valid.  */
+  M4_SYMBOL_COMP               /* Composite symbol, u.u_c.c is valid.  */
 } m4__symbol_type;
 
 #define BIT_TEST(flags, bit)   (((flags) & (bit)) == (bit))
@@ -197,6 +197,7 @@ struct m4_symbol
 struct m4_symbol_chain
 {
   m4_symbol_chain *next;/* Pointer to next link of chain.  */
+  unsigned int quote_age; /* Quote_age of this link of chain, or 0.  */
   const char *str;     /* NUL-terminated string if text, or NULL.  */
   size_t len;          /* Length of str, or 0.  */
   size_t level;                /* Expansion level of content, or SIZE_MAX.  */
@@ -230,7 +231,11 @@ struct m4_symbol_value
       unsigned int     quote_age;
     } u_t;                     /* Valid when type is TEXT, PLACEHOLDER.  */
     const m4_builtin * builtin;/* Valid when type is FUNC.  */
-    m4_symbol_chain *  chain;  /* Valid when type is COMP.  */
+    struct
+    {
+      m4_symbol_chain *        chain;  /* First link of the chain.  */
+      m4_symbol_chain *        end;    /* Last link of the chain.  */
+    } u_c;                     /* Valid when type is COMP.  */
   } u;
 };
 
@@ -248,6 +253,9 @@ struct m4_macro_args
   bool_bitfield inuse : 1;
   /* False if all arguments are just text or func, true if this argv
      refers to another one.  */
+  bool_bitfield wrapper : 1;
+  /* False if all arguments belong to this argv, true if some of them
+     include references to another.  */
   bool_bitfield has_ref : 1;
   const char *argv0; /* The macro name being expanded.  */
   size_t argv0_len; /* Length of argv0.  */
@@ -365,7 +373,8 @@ extern void m4__symtab_remove_module_references 
(m4_symbol_table*,
    all other characters and sentinels. */
 #define CHAR_EOF       256     /* Character return on EOF.  */
 #define CHAR_BUILTIN   257     /* Character return for BUILTIN token.  */
-#define CHAR_RETRY     258     /* Character return for end of input block.  */
+#define CHAR_QUOTE     258     /* Character return for quoted string.  */
+#define CHAR_RETRY     259     /* Character return for end of input block.  */
 
 #define DEF_LQUOTE     "`"     /* Default left quote delimiter.  */
 #define DEF_RQUOTE     "\'"    /* Default right quote delimiter.  */
@@ -451,6 +460,8 @@ typedef enum {
   M4_TOKEN_MACDEF      /* Macro's definition (see "defn"), M4_SYMBOL_FUNC.  */
 } m4__token_type;
 
+extern void            m4__make_text_link (m4_obstack *, m4_symbol_chain **,
+                                           m4_symbol_chain **);
 extern bool            m4__push_symbol (m4 *, m4_symbol_value *, size_t);
 extern m4__token_type  m4__next_token (m4 *, m4_symbol_value *, int *,
                                        m4_obstack *, const char *);
diff --git a/m4/macro.c b/m4/macro.c
index 9963409..683dd26 100644
--- a/m4/macro.c
+++ b/m4/macro.c
@@ -334,9 +334,15 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
              len = obstack_object_size (obs);
              if (argp->type == M4_SYMBOL_FUNC && !len)
                return type == M4_TOKEN_COMMA;
-             obstack_1grow (obs, '\0');
-             VALUE_MODULE (argp) = NULL;
-             m4_set_symbol_value_text (argp, obstack_finish (obs), len, age);
+             if (argp->type != M4_SYMBOL_COMP)
+               {
+                 obstack_1grow (obs, '\0');
+                 VALUE_MODULE (argp) = NULL;
+                 m4_set_symbol_value_text (argp, obstack_finish (obs), len,
+                                           age);
+               }
+             else
+               m4__make_text_link (obs, NULL, &argp->u.u_c.end);
              return type == M4_TOKEN_COMMA;
            }
          /* fallthru */
@@ -360,6 +366,20 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
        case M4_TOKEN_STRING:
          if (!expand_token (context, obs, type, &token, line, first))
            age = 0;
+         if (token.type == M4_SYMBOL_COMP)
+           {
+             if (argp->type != M4_SYMBOL_COMP)
+               {
+                 argp->type = M4_SYMBOL_COMP;
+                 argp->u.u_c.chain = token.u.u_c.chain;
+               }
+             else
+               {
+                 assert (argp->u.u_c.end);
+                 argp->u.u_c.end->next = token.u.u_c.chain;
+               }
+             argp->u.u_c.end = token.u.u_c.end;
+           }
          break;
 
        case M4_TOKEN_MACDEF:
@@ -502,8 +522,23 @@ recursion limit of %zu exceeded, use -L<N> to change it"),
   if (BIT_TEST (VALUE_FLAGS (value), VALUE_DELETED_BIT))
     m4_symbol_value_delete (value);
 
-  /* If argv contains references, those refcounts can be reduced now.  */
-  /* TODO - support references in argv.  */
+  /* If argv contains references, those refcounts must be reduced now.  */
+  if (argv->has_ref)
+    {
+      m4_symbol_chain *chain;
+      size_t i;
+      for (i = 0; i < argv->arraylen; i++)
+       if (argv->array[i]->type == M4_SYMBOL_COMP)
+         {
+           chain = argv->array[i]->u.u_c.chain;
+           while (chain)
+             {
+               if (chain->level < SIZE_MAX)
+                 m4__adjust_refcount (context, chain->level, false);
+               chain = chain->next;
+             }
+         }
+    }
 
   /* We no longer need argv, so reduce the refcount.  Additionally, if
      no other references to argv were created, we can free our portion
@@ -550,6 +585,7 @@ collect_arguments (m4 *context, const char *name, size_t 
len,
 
   args.argc = 1;
   args.inuse = false;
+  args.wrapper = false;
   args.has_ref = false;
   /* Must copy here, since we are consuming tokens, and since symbol
      table can be changed during argument collection.  */
@@ -587,11 +623,14 @@ collect_arguments (m4 *context, const char *name, size_t 
len,
              && m4_get_symbol_value_len (tokenp)
              && m4_get_symbol_value_quote_age (tokenp) != args.quote_age)
            args.quote_age = 0;
+         else if (tokenp->type == M4_SYMBOL_COMP)
+           args.has_ref = true;
        }
       while (more_args);
     }
   argv = (m4_macro_args *) obstack_finish (argv_stack);
   argv->argc = args.argc;
+  argv->has_ref = args.has_ref;
   if (args.quote_age != m4__quote_age (M4SYNTAX))
     argv->quote_age = 0;
   argv->arraylen = args.arraylen;
@@ -674,8 +713,7 @@ process_macro (m4 *context, m4_symbol_value *value, 
m4_obstack *obs,
              text = endp;
            }
          if (i < argc)
-           m4_shipout_string (context, obs, M4ARG (i), m4_arg_len (argv, i),
-                              false);
+           m4_push_arg (context, obs, argv, i);
          break;
 
        case '#':               /* number of arguments */
@@ -947,14 +985,14 @@ static void
 arg_mark (m4_macro_args *argv)
 {
   argv->inuse = true;
-  if (argv->has_ref)
+  if (argv->wrapper)
     {
       /* TODO for now we support only a single-length $@ chain.  */
       assert (argv->arraylen == 1
              && argv->array[0]->type == M4_SYMBOL_COMP
-             && !argv->array[0]->u.chain->next
-             && !argv->array[0]->u.chain->str);
-      argv->array[0]->u.chain->argv->inuse = true;
+             && !argv->array[0]->u.u_c.chain->next
+             && !argv->array[0]->u.u_c.chain->str);
+      argv->array[0]->u.u_c.chain->argv->inuse = true;
     }
 }
 
@@ -970,7 +1008,7 @@ m4_arg_symbol (m4_macro_args *argv, unsigned int index)
   if (argv->argc <= index)
     return &empty_symbol;
 
-  if (!argv->has_ref)
+  if (!argv->wrapper)
     return argv->array[index - 1];
   /* Must cycle through all array slots until we find index, since
      wrappers can contain multiple arguments.  */
@@ -979,7 +1017,7 @@ m4_arg_symbol (m4_macro_args *argv, unsigned int index)
       value = argv->array[i];
       if (value->type == M4_SYMBOL_COMP)
        {
-         m4_symbol_chain *chain = value->u.chain;
+         m4_symbol_chain *chain = value->u.u_c.chain;
          /* TODO - for now we support only a single $@ chain.  */
          assert (!chain->next && !chain->str);
          if (index < chain->argv->argc - (chain->index - 1))
@@ -994,7 +1032,6 @@ m4_arg_symbol (m4_macro_args *argv, unsigned int index)
       else if (--index == 0)
        break;
     }
-  assert (value->type != M4_SYMBOL_COMP);
   return value;
 }
 
@@ -1003,9 +1040,14 @@ m4_arg_symbol (m4_macro_args *argv, unsigned int index)
 bool
 m4_is_arg_text (m4_macro_args *argv, unsigned int index)
 {
+  m4_symbol_value *value;
   if (index == 0 || argv->argc <= index)
     return true;
-  return m4_is_symbol_value_text (m4_arg_symbol (argv, index));
+  value = m4_arg_symbol (argv, index);
+  /* Composite tokens are currently sequences of text only.  */
+  if (m4_is_symbol_value_text (value) || value->type == M4_SYMBOL_COMP)
+    return true;
+  return false;
 }
 
 /* Given ARGV, return true if argument INDEX is a builtin function.
@@ -1020,37 +1062,125 @@ m4_is_arg_func (m4_macro_args *argv, unsigned int 
index)
 
 /* Given ARGV, return the text at argument INDEX.  Abort if the
    argument is not text.  Index 0 is always text, and indices beyond
-   argc return the empty string.  */
+   argc return the empty string.  The result is always NUL-terminated,
+   even if it includes embedded NUL characters.  */
 const char *
-m4_arg_text (m4_macro_args *argv, unsigned int index)
+m4_arg_text (m4 *context, m4_macro_args *argv, unsigned int index)
 {
   m4_symbol_value *value;
+  m4_symbol_chain *chain;
+  m4_obstack *obs;
 
   if (index == 0)
     return argv->argv0;
   if (argv->argc <= index)
     return "";
   value = m4_arg_symbol (argv, index);
-  return m4_get_symbol_value_text (value);
+  if (m4_is_symbol_value_text (value))
+    return m4_get_symbol_value_text (value);
+  /* TODO - concatenate argv refs and functions?  For now, we assume
+     all chain elements are text.  */
+  assert (value->type == M4_SYMBOL_COMP);
+  chain = value->u.u_c.chain;
+  obs = m4_arg_scratch (context);
+  while (chain)
+    {
+      assert (chain->str);
+      obstack_grow (obs, chain->str, chain->len);
+      chain = chain->next;
+    }
+  obstack_1grow (obs, '\0');
+  return (char *) obstack_finish (obs);
 }
 
 /* Given ARGV, compare text arguments INDEXA and INDEXB for equality.
    Both indices must be non-zero.  Return true if the arguments
    contain the same contents; often more efficient than
-   !strcmp (m4_arg_text (argv, indexa), m4_arg_text (argv, indexb)).  */
+   !strcmp (m4_arg_text (context, argv, indexa),
+           m4_arg_text (context, argv, indexb)).  */
 bool
 m4_arg_equal (m4_macro_args *argv, unsigned int indexa, unsigned int indexb)
 {
   m4_symbol_value *sa = m4_arg_symbol (argv, indexa);
   m4_symbol_value *sb = m4_arg_symbol (argv, indexb);
+  m4_symbol_chain tmpa;
+  m4_symbol_chain tmpb;
+  m4_symbol_chain *ca = &tmpa;
+  m4_symbol_chain *cb = &tmpb;
 
+  /* Quick tests.  */
   if (sa == &empty_symbol || sb == &empty_symbol)
     return sa == sb;
+  if (m4_is_symbol_value_text (sa) && m4_is_symbol_value_text (sb))
+    return (m4_get_symbol_value_len (sa) == m4_get_symbol_value_len (sb)
+           && memcmp (m4_get_symbol_value_text (sa),
+                      m4_get_symbol_value_text (sb),
+                      m4_get_symbol_value_len (sa)) == 0);
+
+  /* Convert both arguments to chains, if not one already.  */
   /* TODO - allow builtin tokens in the comparison?  */
-  assert (m4_is_symbol_value_text (sa) && m4_is_symbol_value_text (sb));
-  return (m4_get_symbol_value_len (sa) == m4_get_symbol_value_len (sb)
-         && strcmp (m4_get_symbol_value_text (sa),
-                    m4_get_symbol_value_text (sb)) == 0);
+  if (m4_is_symbol_value_text (sa))
+    {
+      tmpa.next = NULL;
+      tmpa.str = m4_get_symbol_value_text (sa);
+      tmpa.len = m4_get_symbol_value_len (sa);
+    }
+  else
+    {
+      assert (sa->type == M4_SYMBOL_COMP);
+      ca = sa->u.u_c.chain;
+    }
+  if (m4_is_symbol_value_text (sb))
+    {
+      tmpb.next = NULL;
+      tmpb.str = m4_get_symbol_value_text (sb);
+      tmpb.len = m4_get_symbol_value_len (sb);
+    }
+  else
+    {
+      assert (sb->type == M4_SYMBOL_COMP);
+      cb = sb->u.u_c.chain;
+    }
+
+  /* Compare each link of the chain.  */
+  while (ca && cb)
+    {
+      /* TODO support comparison against $@ refs.  */
+      assert (ca->str && cb->str);
+      if (ca->len == cb->len)
+       {
+         if (memcmp (ca->str, cb->str, ca->len) != 0)
+           return false;
+         ca = ca->next;
+         cb = cb->next;
+       }
+      else if (ca->len < cb->len)
+       {
+         if (memcmp (ca->str, cb->str, ca->len) != 0)
+           return false;
+         tmpb.next = cb->next;
+         tmpb.str = cb->str + ca->len;
+         tmpb.len = cb->len - ca->len;
+         ca = ca->next;
+         cb = &tmpb;
+       }
+      else
+       {
+         assert (cb->len < ca->len);
+         if (memcmp (ca->str, cb->str, cb->len) != 0)
+           return false;
+         tmpa.next = ca->next;
+         tmpa.str = ca->str + cb->len;
+         tmpa.len = ca->len - cb->len;
+         ca = &tmpa;
+         cb = cb->next;
+       }
+    }
+
+  /* If we get this far, the two arguments are equal only if both
+     chains are exhausted.  */
+  assert (ca != cb || !ca);
+  return ca == cb;
 }
 
 /* Given ARGV, return true if argument INDEX is the empty string.
@@ -1069,13 +1199,28 @@ size_t
 m4_arg_len (m4_macro_args *argv, unsigned int index)
 {
   m4_symbol_value *value;
+  m4_symbol_chain *chain;
+  size_t len;
 
   if (index == 0)
     return argv->argv0_len;
   if (argv->argc <= index)
     return 0;
   value = m4_arg_symbol (argv, index);
-  return m4_get_symbol_value_len (value);
+  if (m4_is_symbol_value_text (value))
+    return m4_get_symbol_value_len (value);
+  /* TODO - for now, we assume all chain links are text.  */
+  assert (value->type == M4_SYMBOL_COMP);
+  chain = value->u.u_c.chain;
+  len = 0;
+  while (chain)
+    {
+      assert (chain->str);
+      len += chain->len;
+      chain = chain->next;
+    }
+  assert (len);
+  return len;
 }
 
 /* Given ARGV, return the builtin function referenced by argument
@@ -1105,11 +1250,11 @@ m4_make_argv_ref (m4 *context, m4_macro_args *argv, 
const char *argv0,
 
   /* When making a reference through a reference, point to the
      original if possible.  */
-  if (argv->has_ref)
+  if (argv->wrapper)
     {
       /* TODO for now we support only a single-length $@ chain.  */
       assert (argv->arraylen == 1 && argv->array[0]->type == M4_SYMBOL_COMP);
-      chain = argv->array[0]->u.chain;
+      chain = argv->array[0]->u.u_c.chain;
       assert (!chain->next && !chain->str);
       argv = chain->argv;
       index += chain->index - 1;
@@ -1130,10 +1275,12 @@ m4_make_argv_ref (m4 *context, m4_macro_args *argv, 
const char *argv0,
       chain = (m4_symbol_chain *) obstack_alloc (obs, sizeof *chain);
       new_argv->arraylen = 1;
       new_argv->array[0] = value;
+      new_argv->wrapper = true;
       new_argv->has_ref = true;
       value->type = M4_SYMBOL_COMP;
-      value->u.chain = chain;
+      value->u.u_c.chain = value->u.u_c.end = chain;
       chain->next = NULL;
+      chain->quote_age = argv->quote_age;
       chain->str = NULL;
       chain->len = 0;
       chain->level = context->expansion_level - 1;
@@ -1170,9 +1317,23 @@ m4_push_arg (m4 *context, m4_obstack *obs, m4_macro_args 
*argv,
        return;
     }
   /* TODO handle builtin tokens?  */
-  assert (value->type == M4_SYMBOL_TEXT);
-  if (m4__push_symbol (context, value, context->expansion_level - 1))
-    arg_mark (argv);
+  if (value->type == M4_SYMBOL_TEXT)
+    {
+      if (m4__push_symbol (context, value, context->expansion_level - 1))
+       arg_mark (argv);
+    }
+  else if (value->type == M4_SYMBOL_COMP)
+    {
+      /* TODO - really handle composites; for now, just flatten the
+        composite and push its text.  */
+      m4_symbol_chain *chain = value->u.u_c.chain;
+      while (chain)
+       {
+         assert (chain->str);
+         obstack_grow (obs, chain->str, chain->len);
+         chain = chain->next;
+       }
+    }
 }
 
 /* Push series of comma-separated arguments from ARGV, which should
@@ -1184,6 +1345,7 @@ m4_push_args (m4 *context, m4_obstack *obs, m4_macro_args 
*argv, bool skip,
              bool quote)
 {
   m4_symbol_value *value;
+  m4_symbol_chain *chain;
   unsigned int i = skip ? 2 : 1;
   const char *sep = ",";
   size_t sep_len = 1;
@@ -1226,8 +1388,21 @@ m4_push_args (m4 *context, m4_obstack *obs, 
m4_macro_args *argv, bool skip,
       else
        use_sep = true;
       /* TODO handle builtin tokens?  */
-      assert (value->type == M4_SYMBOL_TEXT);
-      inuse |= m4__push_symbol (context, value, context->expansion_level - 1);
+      if (value->type == M4_SYMBOL_TEXT)
+       inuse |= m4__push_symbol (context, value,
+                                 context->expansion_level - 1);
+      else
+       {
+         /* TODO handle composite text.  */
+         assert (value->type == M4_SYMBOL_COMP);
+         chain = value->u.u_c.chain;
+         while (chain)
+           {
+             assert (chain->str);
+             obstack_grow (obs, chain->str, chain->len);
+             chain = chain->next;
+           }
+       }
     }
   if (quote)
     obstack_grow (obs, rquote, strlen (rquote));
diff --git a/m4/output.c b/m4/output.c
index f745efe..dc2194f 100644
--- a/m4/output.c
+++ b/m4/output.c
@@ -602,7 +602,8 @@ m4_shipout_string (m4 *context, m4_obstack *obs, const char 
*s, size_t len,
    current quote characters around S.  If LEN is SIZE_MAX, use the
    string length of S instead.  If MAX_LEN, reduce *MAX_LEN by LEN.
    If LEN is larger than *MAX_LEN, then truncate output and return
-   true; otherwise return false.  */
+   true; otherwise return false.  CONTEXT may be NULL if QUOTED is
+   false.  */
 bool
 m4_shipout_string_trunc (m4 *context, m4_obstack *obs, const char *s,
                         size_t len, bool quoted, size_t *max_len)
diff --git a/m4/symtab.c b/m4/symtab.c
index 30a61ed..3ff6f0d 100644
--- a/m4/symtab.c
+++ b/m4/symtab.c
@@ -1,6 +1,6 @@
 /* GNU m4 -- A simple macro processor
-   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2001, 2005, 2006, 2007
-   Free Software Foundation, Inc.
+   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2001, 2005, 2006,
+   2007, 2008 Free Software Foundation, Inc.
 
    This file is part of GNU M4.
 
@@ -326,10 +326,21 @@ m4_symbol_value_delete (m4_symbol_value *value)
          m4_hash_apply (VALUE_ARG_SIGNATURE (value), arg_destroy_CB, NULL);
          m4_hash_delete (VALUE_ARG_SIGNATURE (value));
        }
-      if (m4_is_symbol_value_text (value))
-       free ((char *) m4_get_symbol_value_text (value));
-      else if (m4_is_symbol_value_placeholder (value))
-       free ((char *) m4_get_symbol_value_placeholder (value));
+      switch (value->type)
+       {
+       case M4_SYMBOL_TEXT:
+         free ((char *) m4_get_symbol_value_text (value));
+         break;
+       case M4_SYMBOL_PLACEHOLDER:
+         free ((char *) m4_get_symbol_value_placeholder (value));
+         break;
+       case M4_SYMBOL_VOID:
+       case M4_SYMBOL_FUNC:
+         break;
+       default:
+         assert (!"m4_symbol_value_delete");
+         abort ();
+       }
       free (value);
     }
 }
@@ -392,10 +403,21 @@ m4_symbol_value_copy (m4_symbol_value *dest, 
m4_symbol_value *src)
   assert (dest);
   assert (src);
 
-  if (m4_is_symbol_value_text (dest))
-    free ((char *) m4_get_symbol_value_text (dest));
-  else if (m4_is_symbol_value_placeholder (dest))
-    free ((char *) m4_get_symbol_value_placeholder (dest));
+  switch (dest->type)
+    {
+    case M4_SYMBOL_TEXT:
+      free ((char *) m4_get_symbol_value_text (dest));
+      break;
+    case M4_SYMBOL_PLACEHOLDER:
+      free ((char *) m4_get_symbol_value_placeholder (dest));
+      break;
+    case M4_SYMBOL_VOID:
+    case M4_SYMBOL_FUNC:
+      break;
+    default:
+      assert (!"m4_symbol_value_delete");
+      abort ();
+    }
 
   if (VALUE_ARG_SIGNATURE (dest))
     {
@@ -411,19 +433,54 @@ m4_symbol_value_copy (m4_symbol_value *dest, 
m4_symbol_value *src)
 
   /* Caller is supposed to free text token strings, so we have to
      copy the string not just its address in that case.  */
-  if (m4_is_symbol_value_text (src))
+  switch (src->type)
     {
-      size_t len = m4_get_symbol_value_len (src);
-      unsigned int age = m4_get_symbol_value_quote_age (src);
-      m4_set_symbol_value_text (dest,
-                               xmemdup (m4_get_symbol_value_text (src),
-                                        len + 1), len, age);
+    case M4_SYMBOL_TEXT:
+      {
+       size_t len = m4_get_symbol_value_len (src);
+       unsigned int age = m4_get_symbol_value_quote_age (src);
+       m4_set_symbol_value_text (dest,
+                                 xmemdup (m4_get_symbol_value_text (src),
+                                          len + 1), len, age);
+      }
+      break;
+    case M4_SYMBOL_FUNC:
+      /* Nothing further to do.  */
+      break;
+    case M4_SYMBOL_PLACEHOLDER:
+      m4_set_symbol_value_placeholder (dest,
+                                      xstrdup (m4_get_symbol_value_placeholder
+                                               (src)));
+      break;
+    case M4_SYMBOL_COMP:
+      {
+       m4_symbol_chain *chain = src->u.u_c.chain;
+       size_t len = 0;
+       char *str;
+       char *p;
+       while (chain)
+         {
+           /* TODO for now, only text links are supported.  */
+           assert (chain->str);
+           len += chain->len;
+           chain = chain->next;
+         }
+       p = str = xcharalloc (len + 1);
+       chain = src->u.u_c.chain;
+       while (chain)
+         {
+           memcpy (p, chain->str, chain->len);
+           p += chain->len;
+           chain = chain->next;
+         }
+       *p = '\0';
+       m4_set_symbol_value_text (dest, str, len, 0);
+      }
+      break;
+    default:
+      assert (!"m4_symbol_value_copy");
+      abort ();
     }
-  else if (m4_is_symbol_value_placeholder (src))
-    m4_set_symbol_value_placeholder (dest,
-                                    xstrdup (m4_get_symbol_value_placeholder
-                                             (src)));
-
   if (VALUE_ARG_SIGNATURE (src))
     VALUE_ARG_SIGNATURE (dest) = m4_hash_dup (VALUE_ARG_SIGNATURE (src),
                                              arg_copy_CB);
@@ -488,8 +545,9 @@ m4_symbol_value_print (m4_symbol_value *value, m4_obstack 
*obs, bool quote,
   size_t len;
   bool truncated = false;
 
-  if (m4_is_symbol_value_text (value))
+  switch (value->type)
     {
+    case M4_SYMBOL_TEXT:
       text = m4_get_symbol_value_text (value);
       len = m4_get_symbol_value_len (value);
       if (maxlen < len)
@@ -497,27 +555,45 @@ m4_symbol_value_print (m4_symbol_value *value, m4_obstack 
*obs, bool quote,
          len = maxlen;
          truncated = true;
        }
-    }
-  else if (m4_is_symbol_value_func (value))
-    {
-      const m4_builtin *bp = m4_get_symbol_value_builtin (value);
-      text = bp->name;
-      len = strlen (text);
-      lquote = "<";
-      rquote = ">";
-      quote = true;
-    }
-  else if (m4_is_symbol_value_placeholder (value))
-    {
+      break;
+    case M4_SYMBOL_FUNC:
+      {
+       const m4_builtin *bp = m4_get_symbol_value_builtin (value);
+       text = bp->name;
+       len = strlen (text);
+       lquote = "<";
+       rquote = ">";
+       quote = true;
+      }
+      break;
+    case M4_SYMBOL_PLACEHOLDER:
       text = m4_get_symbol_value_placeholder (value);
       /* FIXME - is it worth translating "placeholder for "?  */
       len = strlen (text);
       lquote = "<placeholder for ";
       rquote = ">";
       quote = true;
-    }
-  else
-    {
+      break;
+    case M4_SYMBOL_COMP:
+      {
+       m4_symbol_chain *chain = value->u.u_c.chain;
+       if (quote)
+         obstack_grow (obs, lquote, strlen (lquote));
+       while (chain)
+         {
+           /* TODO for now, assume all links are text.  */
+           assert (chain->str);
+           if (m4_shipout_string_trunc (NULL, obs, chain->str, chain->len,
+                                        false, &maxlen))
+             break;
+           chain = chain->next;
+         }
+       if (quote)
+         obstack_grow (obs, rquote, strlen (rquote));
+       assert (!module);
+       return;
+      }
+    default:
       assert (!"invalid token in symbol_value_print");
       abort ();
     }
diff --git a/tests/macros.at b/tests/macros.at
index 367d47e..3d74356 100644
--- a/tests/macros.at
+++ b/tests/macros.at
@@ -1,5 +1,5 @@
 # Hand crafted tests for GNU M4.                               -*- Autotest -*-
-# Copyright (C) 2001, 2006, 2007 Free Software Foundation, Inc.
+# Copyright (C) 2001, 2006, 2007, 2008 Free Software Foundation, Inc.
 
 # This file is part of GNU M4.
 #
@@ -535,6 +535,24 @@ AT_CHECK_M4([in], [0], [[40
 ]])
 
 AT_DATA([in], [[define(`echo', `$@')dnl
+define(`foo', echo(`01234567890123456789')echo(`98765432109876543210'))dnl
+foo
+]])
+
+AT_CHECK_M4([in], [0], [[0123456789012345678998765432109876543210
+]])
+
+AT_DATA([in], [[define(`a', `A')define(`echo', `$@')define(`join', `$1$2')dnl
+define(`abcdefghijklmnopqrstuvwxyz', `Z')dnl
+join(`a', `bcdefghijklmnopqrstuvwxyz')
+join(`a', echo(`bcdefghijklmnopqrstuvwxyz'))
+]])
+
+AT_CHECK_M4([in], [0], [[Z
+Z
+]])
+
+AT_DATA([in], [[define(`echo', `$@')dnl
 echo(echo(`01234567890123456789', `01234567890123456789')
 echo(`98765432109876543210', `98765432109876543210'))
 len((echo(`01234567890123456789',
-- 
1.5.3.8

>From c2c0a7ddc9f559d66a17184ea8be2c363dd4807c Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Sat, 27 Oct 2007 05:44:09 -0600
Subject: [PATCH] Stage 11: full circle for single argument references.

Pass quoted strings through to argument collection in a single
action, so that an argument can be reused throughout macro
recursion if it remains unchanged.
Memory impact: noticeable improvement, due to more reuse in
argument collection stacks.
Speed impact: noticeable improvement, due to less copying.
* src/m4.h (struct token_chain): Add quote_age member.
(struct token_data): Add end member to chain alternate.
(make_text_link): New prototype.
* src/input.c (CHAR_QUOTE): New macro.
(word_start): Pre-allocate.
(set_word_regexp): Simplify.
(make_text_link): Export, and handle new fields.
(next_char, next_char_1): Add parameter.
(append_quote_token): New function.
(match_input, next_token): Adjust callers to handle quoted input
blocks.
* src/macro.c (struct macro_arguments): Add wrapper member.
(expand_argument): Accept composite blocks from input engine.
(expand_macro): Reduce refcounts of composite arguments.
(collect_arguments, arg_token, arg_mark, make_argv_ref): Update to
use new fields.
(arg_type, arg_text, arg_equal, arg_len): Treat composite
arguments as text.
(push_arg, push_args): Handle composites.

(cherry picked from commit b1fef201f5d121e25e5dd61ec8ca3eac41a899ba)

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog   |   29 ++++++++
 src/input.c |  207 +++++++++++++++++++++++++++++++++--------------------
 src/m4.h    |   25 ++++---
 src/macro.c |  233 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
 4 files changed, 376 insertions(+), 118 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 5ad26e3..15549a6 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,32 @@
+2008-01-22  Eric Blake  <address@hidden>
+
+       Stage 11: full circle for single argument references.
+       Pass quoted strings through to argument collection in a single
+       action, so that an argument can be reused throughout macro
+       recursion if it remains unchanged.
+       Memory impact: noticeable improvement, due to more reuse in
+       argument collection stacks.
+       Speed impact: noticeable improvement, due to less copying.
+       * src/m4.h (struct token_chain): Add quote_age member.
+       (struct token_data): Add end member to chain alternate.
+       (make_text_link): New prototype.
+       * src/input.c (CHAR_QUOTE): New macro.
+       (word_start): Pre-allocate.
+       (set_word_regexp): Simplify.
+       (make_text_link): Export, and handle new fields.
+       (next_char, next_char_1): Add parameter.
+       (append_quote_token): New function.
+       (match_input, next_token): Adjust callers to handle quoted input
+       blocks.
+       * src/macro.c (struct macro_arguments): Add wrapper member.
+       (expand_argument): Accept composite blocks from input engine.
+       (expand_macro): Reduce refcounts of composite arguments.
+       (collect_arguments, arg_token, arg_mark, make_argv_ref): Update to
+       use new fields.
+       (arg_type, arg_text, arg_equal, arg_len): Treat composite
+       arguments as text.
+       (push_arg, push_args): Handle composites.
+
 2008-01-17  Eric Blake  <address@hidden>
 
        Stage 10: avoid extra copying of strings and comments.
diff --git a/src/input.c b/src/input.c
index bc73c6f..9f25e8f 100644
--- a/src/input.c
+++ b/src/input.c
@@ -153,6 +153,7 @@ static bool input_change;
 
 #define CHAR_EOF       256     /* Character return on EOF.  */
 #define CHAR_MACRO     257     /* Character return for MACRO token.  */
+#define CHAR_QUOTE     258     /* Character return for quoted string.  */
 
 /* Quote chars.  */
 STRING rquote;
@@ -167,7 +168,7 @@ STRING ecomm;
 # define DEFAULT_WORD_REGEXP "[_a-zA-Z][_a-zA-Z0-9]*"
 
 /* Table of characters that can start a word.  */
-static char *word_start;
+static char word_start[256];
 
 /* Current regular expression for detecting words.  */
 static struct re_pattern_buffer word_regexp;
@@ -201,7 +202,7 @@ static const char *token_type_string (token_type);
 | chain that starts at *START and ends at *END.  START may be NULL   |
 | if *END is non-NULL.                                               |
 `-------------------------------------------------------------------*/
-static void
+void
 make_text_link (struct obstack *obs, token_chain **start, token_chain **end)
 {
   token_chain *chain;
@@ -218,6 +219,7 @@ make_text_link (struct obstack *obs, token_chain **start, 
token_chain **end)
        *start = chain;
       *end = chain;
       chain->next = NULL;
+      chain->quote_age = 0;
       chain->str = str;
       chain->len = len;
       chain->level = -1;
@@ -361,6 +363,7 @@ push_token (token_data *token, int level)
     next->u.u_c.chain = chain;
   next->u.u_c.end = chain;
   chain->next = NULL;
+  chain->quote_age = TOKEN_DATA_QUOTE_AGE (token);
   chain->str = TOKEN_DATA_TEXT (token);
   chain->len = TOKEN_DATA_LEN (token);
   chain->level = level;
@@ -563,19 +566,6 @@ pop_wrapup (void)
   return true;
 }
 
-/*-------------------------------------------------------------------.
-| When a MACRO token is seen, next_token () uses init_macro_token () |
-| to retrieve the value of the function pointer and store it in TD.  |
-`-------------------------------------------------------------------*/
-
-static void
-init_macro_token (token_data *td)
-{
-  assert (isp->type == INPUT_MACRO);
-  TOKEN_DATA_TYPE (td) = TOKEN_FUNC;
-  TOKEN_DATA_FUNC (td) = isp->u.func;
-}
-
 /*--------------------------------------------------------------.
 | Dump a representation of INPUT to the obstack OBS, for use in |
 | tracing.                                                      |
@@ -699,16 +689,19 @@ peek_input (void)
 | consisting of a newline alone is taken as belonging to the line it |
 | ends, and the current line number is not incremented until the     |
 | next character is read.  99.9% of all calls will read from a       |
-| string, so factor that out into a macro for speed.                 |
+| string, so factor that out into a macro for speed.  If             |
+| ALLOW_QUOTE, and the current input matches the current quote age,  |
+| return CHAR_QUOTE and leave consumption of data for                |
+| append_quote_token.                                                |
 `-------------------------------------------------------------------*/
 
-#define next_char()                                                    \
+#define next_char(AQ)                                                  \
   (isp && isp->type == INPUT_STRING && isp->u.u_s.len && !input_change \
    ? (isp->u.u_s.len--, to_uchar (*isp->u.u_s.str++))                  \
-   : next_char_1 ())
+   : next_char_1 (AQ))
 
 static int
-next_char_1 (void)
+next_char_1 (bool allow_quote)
 {
   int ch;
   token_chain *chain;
@@ -765,10 +758,14 @@ next_char_1 (void)
          chain = isp->u.u_c.chain;
          while (chain)
            {
+             if (allow_quote && chain->quote_age == current_quote_age)
+               return CHAR_QUOTE;
              if (chain->str)
                {
                  if (chain->len)
                    {
+                     /* Partial consumption invalidates quote age.  */
+                     chain->quote_age = 0;
                      chain->len--;
                      return to_uchar (*chain->str++);
                    }
@@ -808,7 +805,7 @@ skip_line (const char *name)
   const char *file = current_file;
   int line = current_line;
 
-  while ((ch = next_char ()) != CHAR_EOF && ch != '\n')
+  while ((ch = next_char (false)) != CHAR_EOF && ch != '\n')
     ;
   if (ch == CHAR_EOF)
     /* current_file changed to "" if we see CHAR_EOF, use the
@@ -825,6 +822,49 @@ skip_line (const char *name)
 }
 
 
+/*-------------------------------------------------------------------.
+| When a MACRO token is seen, next_token () uses init_macro_token () |
+| to retrieve the value of the function pointer and store it in TD.  |
+`-------------------------------------------------------------------*/
+
+static void
+init_macro_token (token_data *td)
+{
+  assert (isp->type == INPUT_MACRO);
+  TOKEN_DATA_TYPE (td) = TOKEN_FUNC;
+  TOKEN_DATA_FUNC (td) = isp->u.func;
+}
+
+/*-------------------------------------------------------------------.
+| When a QUOTE token is seen, convert TD to a composite (if it is    |
+| not one already), consisting of any unfinished text on OBS, as     |
+| well as the quoted token from the top of the input stack.  Use OBS |
+| for any additional allocations needed to store the token chain.    |
+`-------------------------------------------------------------------*/
+static void
+append_quote_token (struct obstack *obs, token_data *td)
+{
+  token_chain *src_chain = isp->u.u_c.chain;
+  token_chain *chain;
+  assert (isp->type == INPUT_CHAIN && obs && current_quote_age);
+
+  if (TOKEN_DATA_TYPE (td) == TOKEN_VOID)
+    {
+      TOKEN_DATA_TYPE (td) = TOKEN_COMP;
+      td->u.u_c.chain = td->u.u_c.end = NULL;
+    }
+  assert (TOKEN_DATA_TYPE (td) == TOKEN_COMP);
+  make_text_link (obs, &td->u.u_c.chain, &td->u.u_c.end);
+  chain = (token_chain *) obstack_copy (obs, src_chain, sizeof *chain);
+  if (td->u.u_c.end)
+    td->u.u_c.end->next = chain;
+  else
+    td->u.u_c.chain = chain;
+  td->u.u_c.end = chain;
+  td->u.u_c.end->next = NULL;
+  isp->u.u_c.chain = src_chain->next;
+}
+
 /*------------------------------------------------------------------.
 | This function is for matching a string against a prefix of the    |
 | input stream.  If the string S matches the input and CONSUME is   |
@@ -848,14 +888,14 @@ match_input (const char *s, bool consume)
   if (s[1] == '\0')
     {
       if (consume)
-       (void) next_char ();
+       next_char (false);
       return true;                     /* short match */
     }
 
-  (void) next_char ();
+  next_char (false);
   for (n = 1, t = s++; (ch = peek_input ()) == to_uchar (*s++); )
     {
-      (void) next_char ();
+      next_char (false);
       n++;
       if (*s == '\0')          /* long match */
        {
@@ -1016,7 +1056,6 @@ void
 set_word_regexp (const char *caller, const char *regexp)
 {
   int i;
-  char test[2];
   const char *msg;
   struct re_pattern_buffer new_word_regexp;
 
@@ -1048,15 +1087,10 @@ set_word_regexp (const char *caller, const char *regexp)
   default_word_regexp = false;
   set_quote_age ();
 
-  if (word_start == NULL)
-    word_start = (char *) xmalloc (256);
-
-  word_start[0] = '\0';
-  test[1] = '\0';
   for (i = 1; i < 256; i++)
     {
-      test[0] = i;
-      word_start[i] = re_search (&word_regexp, test, 1, 0, 0, NULL) >= 0;
+      char test = i;
+      word_start[i] = re_match (&word_regexp, &test, 1, 0, NULL) > 0;
     }
 }
 
@@ -1140,16 +1174,17 @@ safe_quotes (void)
 
 
 /*--------------------------------------------------------------------.
-| Parse and return a single token from the input stream.  A token     |
-| can either be TOKEN_EOF, if the input_stack is empty; it can be     |
-| TOKEN_STRING for a quoted string or comment; TOKEN_WORD for         |
-| something that is a potential macro name; and TOKEN_SIMPLE for any  |
-| single character that is not a part of any of the previous types.   |
-| If LINE is not NULL, set *LINE to the line where the token starts.  |
-| If OBS is not NULL, expand TOKEN_STRING directly into OBS rather    |
-| than in token_stack temporary storage area.  Report errors          |
-| (unterminated comments or strings) on behalf of CALLER, if          |
-| non-NULL.                                                           |
+| Parse a single token from the input stream, set TD to its           |
+| contents, and return its type.  A token is TOKEN_EOF if the         |
+| input_stack is empty; TOKEN_STRING for a quoted string or comment;  |
+| TOKEN_WORD for something that is a potential macro name; and        |
+| TOKEN_SIMPLE for any single character that is not a part of any of  |
+| the previous types.  If LINE is not NULL, set *LINE to the line     |
+| where the token starts.  If OBS is not NULL, expand TOKEN_STRING    |
+| directly into OBS rather than in token_stack temporary storage      |
+| area, and TD could be a TOKEN_COMP instead of the usual             |
+| TOKEN_TEXT.  Report errors (unterminated comments or strings) on    |
+| behalf of CALLER, if non-NULL.                                      |
 |                                                                     |
 | Next_token () returns the token type, and passes back a pointer to  |
 | the token data through TD.  Non-string token text is collected on   |
@@ -1165,7 +1200,6 @@ next_token (token_data *td, int *line, struct obstack 
*obs, const char *caller)
   int quote_level;
   token_type type;
 #ifdef ENABLE_CHANGEWORD
-  int startpos;
   char *orig_text = NULL;
 #endif /* ENABLE_CHANGEWORD */
   const char *file;
@@ -1181,19 +1215,20 @@ next_token (token_data *td, int *line, struct obstack 
*obs, const char *caller)
     line = &dummy;
 
   /* Can't consume character until after CHAR_MACRO is handled.  */
+  TOKEN_DATA_TYPE (td) = TOKEN_VOID;
   ch = peek_input ();
   if (ch == CHAR_EOF)
     {
 #ifdef DEBUG_INPUT
       xfprintf (stderr, "next_token -> EOF\n");
 #endif /* DEBUG_INPUT */
-      next_char ();
+      next_char (false);
       return TOKEN_EOF;
     }
   if (ch == CHAR_MACRO)
     {
       init_macro_token (td);
-      next_char ();
+      next_char (false);
 #ifdef DEBUG_INPUT
       xfprintf (stderr, "next_token -> MACDEF (%s)\n",
                find_builtin_by_addr (TOKEN_DATA_FUNC (td))->name);
@@ -1201,7 +1236,7 @@ next_token (token_data *td, int *line, struct obstack 
*obs, const char *caller)
       return TOKEN_MACDEF;
     }
 
-  next_char (); /* Consume character we already peeked at.  */
+  next_char (false); /* Consume character we already peeked at.  */
   file = current_file;
   *line = current_line;
   if (MATCH (ch, bcomm.string, true))
@@ -1209,11 +1244,14 @@ next_token (token_data *td, int *line, struct obstack 
*obs, const char *caller)
       if (obs)
        obs_td = obs;
       obstack_grow (obs_td, bcomm.string, bcomm.length);
-      while ((ch = next_char ()) != CHAR_EOF
+      while ((ch = next_char (false)) < CHAR_EOF
             && !MATCH (ch, ecomm.string, true))
        obstack_1grow (obs_td, ch);
       if (ch != CHAR_EOF)
-       obstack_grow (obs_td, ecomm.string, ecomm.length);
+       {
+         assert (ch < CHAR_EOF);
+         obstack_grow (obs_td, ecomm.string, ecomm.length);
+       }
       else
        /* Current_file changed to "" if we see CHAR_EOF, use the
           previous value we stored earlier.  */
@@ -1225,10 +1263,10 @@ next_token (token_data *td, int *line, struct obstack 
*obs, const char *caller)
   else if (default_word_regexp && (isalpha (ch) || ch == '_'))
     {
       obstack_1grow (&token_stack, ch);
-      while ((ch = peek_input ()) != CHAR_EOF && (isalnum (ch) || ch == '_'))
+      while ((ch = peek_input ()) < CHAR_EOF && (isalnum (ch) || ch == '_'))
        {
          obstack_1grow (&token_stack, ch);
-         (void) next_char ();
+         next_char (false);
        }
       type = TOKEN_WORD;
     }
@@ -1241,20 +1279,17 @@ next_token (token_data *td, int *line, struct obstack 
*obs, const char *caller)
       while (1)
        {
          ch = peek_input ();
-         if (ch == CHAR_EOF)
+         if (ch >= CHAR_EOF)
            break;
          obstack_1grow (&token_stack, ch);
-         startpos = re_search (&word_regexp,
-                               (char *) obstack_base (&token_stack),
-                               obstack_object_size (&token_stack), 0, 0,
-                               &regs);
-         if (startpos != 0 ||
-             regs.end [0] != obstack_object_size (&token_stack))
+         if (re_match (&word_regexp, (char *) obstack_base (&token_stack),
+                       obstack_object_size (&token_stack), 0, &regs)
+             != obstack_object_size (&token_stack))
            {
              obstack_blank (&token_stack, -1);
              break;
            }
-         next_char ();
+         next_char (false);
        }
 
       obstack_1grow (&token_stack, '\0');
@@ -1297,14 +1332,16 @@ next_token (token_data *td, int *line, struct obstack 
*obs, const char *caller)
       quote_level = 1;
       while (1)
        {
-         ch = next_char ();
+         ch = next_char (obs != NULL && current_quote_age);
          if (ch == CHAR_EOF)
            /* Current_file changed to "" if we see CHAR_EOF, use
               the previous value we stored earlier.  */
            m4_error_at_line (EXIT_FAILURE, 0, file, *line, caller,
                              _("end of file in string"));
 
-         if (MATCH (ch, rquote.string, true))
+         if (ch == CHAR_QUOTE)
+           append_quote_token (obs, td);
+         else if (MATCH (ch, rquote.string, true))
            {
              if (--quote_level == 0)
                break;
@@ -1316,35 +1353,49 @@ next_token (token_data *td, int *line, struct obstack 
*obs, const char *caller)
              obstack_grow (obs_td, lquote.string, lquote.length);
            }
          else
-           obstack_1grow (obs_td, ch);
+           {
+             assert (ch < CHAR_EOF);
+             obstack_1grow (obs_td, ch);
+           }
        }
       type = TOKEN_STRING;
     }
 
-  TOKEN_DATA_TYPE (td) = TOKEN_TEXT;
-  TOKEN_DATA_LEN (td) = obstack_object_size (obs_td);
-  if (obs_td != obs)
+  if (TOKEN_DATA_TYPE (td) == TOKEN_VOID)
     {
-      obstack_1grow (obs_td, '\0');
-      TOKEN_DATA_TEXT (td) = (char *) obstack_finish (obs_td);
-    }
-  else
-    TOKEN_DATA_TEXT (td) = NULL;
-  TOKEN_DATA_QUOTE_AGE (td) = current_quote_age;
+      TOKEN_DATA_TYPE (td) = TOKEN_TEXT;
+      TOKEN_DATA_LEN (td) = obstack_object_size (obs_td);
+      if (obs_td != obs)
+       {
+         obstack_1grow (obs_td, '\0');
+         TOKEN_DATA_TEXT (td) = (char *) obstack_finish (obs_td);
+       }
+      else
+       TOKEN_DATA_TEXT (td) = NULL;
+      TOKEN_DATA_QUOTE_AGE (td) = current_quote_age;
 #ifdef ENABLE_CHANGEWORD
-  if (orig_text == NULL)
-    TOKEN_DATA_ORIG_TEXT (td) = TOKEN_DATA_TEXT (td);
+      if (orig_text == NULL)
+       TOKEN_DATA_ORIG_TEXT (td) = TOKEN_DATA_TEXT (td);
+      else
+       {
+         TOKEN_DATA_ORIG_TEXT (td) = orig_text;
+         TOKEN_DATA_LEN (td) = strlen (orig_text);
+       }
+#endif /* ENABLE_CHANGEWORD */
+#ifdef DEBUG_INPUT
+      xfprintf (stderr, "next_token -> %s (%s), len %zu\n",
+               token_type_string (type), TOKEN_DATA_TEXT (td),
+               TOKEN_DATA_LEN (td));
+#endif /* DEBUG_INPUT */
+    }
   else
     {
-      TOKEN_DATA_ORIG_TEXT (td) = orig_text;
-      TOKEN_DATA_LEN (td) = strlen (orig_text);
-    }
-#endif /* ENABLE_CHANGEWORD */
+      assert (TOKEN_DATA_TYPE (td) == TOKEN_COMP && type == TOKEN_STRING);
 #ifdef DEBUG_INPUT
-  xfprintf (stderr, "next_token -> %s (%s), len %zu\n",
-           token_type_string (type), TOKEN_DATA_TEXT (td),
-           TOKEN_DATA_LEN (td));
+      xfprintf (stderr, "next_token -> %s <chain>\n",
+               token_type_string (type));
 #endif /* DEBUG_INPUT */
+    }
   return type;
 }
 
diff --git a/src/m4.h b/src/m4.h
index ea3947f..474338b 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -271,19 +271,20 @@ enum token_data_type
   TOKEN_VOID,  /* Token still being constructed, u is invalid.  */
   TOKEN_TEXT,  /* Straight text, u.u_t is valid.  */
   TOKEN_FUNC,  /* Builtin function definition, u.func is valid.  */
-  TOKEN_COMP   /* Composite argument, u.chain is valid.  */
+  TOKEN_COMP   /* Composite argument, u.u_c is valid.  */
 };
 
 /* Composite tokens are built of a linked list of chains.  */
 struct token_chain
 {
-  token_chain *next;   /* Pointer to next link of chain.  */
-  const char *str;     /* NUL-terminated string if text, else NULL.  */
-  size_t len;          /* Length of str, else 0.  */
-  int level;           /* Expansion level of link content, or -1.  */
-  macro_arguments *argv;/* Reference to earlier address@hidden  */
-  unsigned int index;  /* Argument index within argv.  */
-  bool flatten;                /* True to treat builtins as text.  */
+  token_chain *next;           /* Pointer to next link of chain.  */
+  unsigned int quote_age;      /* Quote_age of this link of chain, or 0.  */
+  const char *str;             /* NUL-terminated string if text, or NULL.  */
+  size_t len;                  /* Length of str, else 0.  */
+  int level;                   /* Expansion level of link content, or -1.  */
+  macro_arguments *argv;       /* Reference to earlier address@hidden  */
+  unsigned int index;          /* Argument index within argv.  */
+  bool flatten;                        /* True to treat builtins as text.  */
 };
 
 /* The content of a token or macro argument.  */
@@ -319,7 +320,12 @@ struct token_data
 
       /* Composite text: a linked list of straight text and $@
         placeholders.  */
-      token_chain *chain;
+      struct
+       {
+         token_chain *chain;   /* First link of the chain.  */
+         token_chain *end;     /* Last link of the chain.  */
+       }
+      u_c;
     }
   u;
 };
@@ -342,6 +348,7 @@ token_type next_token (token_data *, int *, struct obstack 
*, const char *);
 void skip_line (const char *);
 
 /* push back input */
+void make_text_link (struct obstack *, token_chain **, token_chain **);
 void push_file (FILE *, const char *, bool);
 void push_macro (builtin_func *);
 struct obstack *push_string_init (void);
diff --git a/src/macro.c b/src/macro.c
index ef18b8f..62af398 100644
--- a/src/macro.c
+++ b/src/macro.c
@@ -45,6 +45,9 @@ struct macro_arguments
   bool_bitfield inuse : 1;
   /* False if all arguments are just text or func, true if this argv
      refers to another one.  */
+  bool_bitfield wrapper : 1;
+  /* False if all arguments belong to this argv, true if some of them
+     include references to another.  */
   bool_bitfield has_ref : 1;
   const char *argv0; /* The macro name being expanded.  */
   size_t argv0_len; /* Length of argv0.  */
@@ -382,11 +385,16 @@ expand_argument (struct obstack *obs, token_data *argp, 
const char *caller)
                    return t == TOKEN_COMMA;
                  warn_builtin_concat (caller, TOKEN_DATA_FUNC (argp));
                }
-             obstack_1grow (obs, '\0');
-             TOKEN_DATA_TYPE (argp) = TOKEN_TEXT;
-             TOKEN_DATA_TEXT (argp) = (char *) obstack_finish (obs);
-             TOKEN_DATA_LEN (argp) = len;
-             TOKEN_DATA_QUOTE_AGE (argp) = age;
+             if (TOKEN_DATA_TYPE (argp) != TOKEN_COMP)
+               {
+                 obstack_1grow (obs, '\0');
+                 TOKEN_DATA_TYPE (argp) = TOKEN_TEXT;
+                 TOKEN_DATA_TEXT (argp) = (char *) obstack_finish (obs);
+                 TOKEN_DATA_LEN (argp) = len;
+                 TOKEN_DATA_QUOTE_AGE (argp) = age;
+               }
+             else
+               make_text_link (obs, NULL, &argp->u.u_c.end);
              return t == TOKEN_COMMA;
            }
          /* fallthru */
@@ -411,6 +419,23 @@ expand_argument (struct obstack *obs, token_data *argp, 
const char *caller)
        case TOKEN_STRING:
          if (!expand_token (obs, t, &td, line, first))
            age = 0;
+         if (TOKEN_DATA_TYPE (&td) == TOKEN_COMP)
+           {
+             if (TOKEN_DATA_TYPE (argp) != TOKEN_COMP)
+               {
+                 if (TOKEN_DATA_TYPE (argp) == TOKEN_FUNC)
+                   warn_builtin_concat (caller, TOKEN_DATA_FUNC (argp));
+                 TOKEN_DATA_TYPE (argp) = TOKEN_COMP;
+                 argp->u.u_c.chain = td.u.u_c.chain;
+                 argp->u.u_c.end = td.u.u_c.end;
+               }
+             else
+               {
+                 assert (argp->u.u_c.end);
+                 argp->u.u_c.end->next = td.u.u_c.chain;
+                 argp->u.u_c.end = td.u.u_c.end;
+               }
+           }
          break;
 
        case TOKEN_MACDEF:
@@ -459,6 +484,7 @@ collect_arguments (symbol *sym, struct obstack *arguments,
 
   args.argc = 1;
   args.inuse = false;
+  args.wrapper = false;
   args.has_ref = false;
   args.argv0 = SYMBOL_NAME (sym);
   args.argv0_len = strlen (args.argv0);
@@ -490,11 +516,14 @@ collect_arguments (symbol *sym, struct obstack *arguments,
              && TOKEN_DATA_LEN (tdp) > 0
              && TOKEN_DATA_QUOTE_AGE (tdp) != args.quote_age)
            args.quote_age = 0;
+         else if (TOKEN_DATA_TYPE (tdp) == TOKEN_COMP)
+           args.has_ref = true;
        }
       while (more_args);
     }
   argv = (macro_arguments *) obstack_finish (argv_stack);
   argv->argc = args.argc;
+  argv->has_ref = args.has_ref;
   if (args.quote_age != quote_age ())
     argv->quote_age = 0;
   argv->arraylen = args.arraylen;
@@ -633,8 +662,23 @@ expand_macro (symbol *sym)
   if (SYMBOL_DELETED (sym))
     free_symbol (sym);
 
-  /* If argv contains references, those refcounts can be reduced now.  */
-  /* TODO - support references in argv.  */
+  /* If argv contains references, those refcounts must be reduced now.  */
+  if (argv->has_ref)
+    {
+      token_chain *chain;
+      size_t i;
+      for (i = 0; i < argv->arraylen; i++)
+       if (TOKEN_DATA_TYPE (argv->array[i]) == TOKEN_COMP)
+         {
+           chain = argv->array[i]->u.u_c.chain;
+           while (chain)
+             {
+               if (chain->level >= 0)
+                 adjust_refcount (chain->level, false);
+               chain = chain->next;
+             }
+         }
+    }
 
   /* We no longer need argv, so reduce the refcount.  Additionally, if
      no other references to argv were created, we can free our portion
@@ -698,7 +742,7 @@ arg_token (macro_arguments *argv, unsigned int index)
   token_data *token;
 
   assert (index && index < argv->argc);
-  if (!argv->has_ref)
+  if (!argv->wrapper)
     return argv->array[index - 1];
   /* Must cycle through all tokens, until we find index, since a ref
      may occupy multiple indices.  */
@@ -707,7 +751,7 @@ arg_token (macro_arguments *argv, unsigned int index)
       token = argv->array[i];
       if (TOKEN_DATA_TYPE (token) == TOKEN_COMP)
        {
-         token_chain *chain = token->u.chain;
+         token_chain *chain = token->u.u_c.chain;
          /* TODO - for now we support only a single-length $@ chain.  */
          assert (!chain->next && !chain->str);
          if (index < chain->argv->argc - (chain->index - 1))
@@ -731,14 +775,14 @@ static void
 arg_mark (macro_arguments *argv)
 {
   argv->inuse = true;
-  if (argv->has_ref)
+  if (argv->wrapper)
     {
       /* TODO for now we support only a single-length $@ chain.  */
       assert (argv->arraylen == 1
              && TOKEN_DATA_TYPE (argv->array[0]) == TOKEN_COMP
-             && !argv->array[0]->u.chain->next
-             && !argv->array[0]->u.chain->str);
-      argv->array[0]->u.chain->argv->inuse = true;
+             && !argv->array[0]->u.u_c.chain->next
+             && !argv->array[0]->u.u_c.chain->str);
+      argv->array[0]->u.u_c.chain->argv->inuse = true;
     }
 }
 
@@ -761,17 +805,22 @@ arg_type (macro_arguments *argv, unsigned int index)
     return TOKEN_TEXT;
   token = arg_token (argv, index);
   type = TOKEN_DATA_TYPE (token);
-  assert (type != TOKEN_COMP);
+  /* Composite tokens are currently sequences of text only.  */
+  if (type == TOKEN_COMP)
+    type = TOKEN_TEXT;
   return type;
 }
 
 /* Given ARGV, return the text at argument INDEX.  Abort if the
    argument is not text.  Index 0 is always text, and indices beyond
-   argc return the empty string.  */
+   argc return the empty string.  The result is always NUL-terminated,
+   even if it includes embedded NUL characters.  */
 const char *
 arg_text (macro_arguments *argv, unsigned int index)
 {
   token_data *token;
+  token_chain *chain;
+  struct obstack *obs;
 
   if (index == 0)
     return argv->argv0;
@@ -783,8 +832,18 @@ arg_text (macro_arguments *argv, unsigned int index)
     case TOKEN_TEXT:
       return TOKEN_DATA_TEXT (token);
     case TOKEN_COMP:
-      /* TODO - how to concatenate multiple arguments?  For now, we expect
-        only one element in the chain, and arg_token dereferences it.  */
+      /* TODO - concatenate multiple arguments?  For now, we assume
+        all elements are text.  */
+      chain = token->u.u_c.chain;
+      obs = arg_scratch ();
+      while (chain)
+       {
+         assert (chain->str);
+         obstack_grow (obs, chain->str, chain->len);
+         chain = chain->next;
+       }
+      obstack_1grow (obs, '\0');
+      return (char *) obstack_finish (obs);
     default:
       break;
     }
@@ -801,14 +860,84 @@ arg_equal (macro_arguments *argv, unsigned int indexa, 
unsigned int indexb)
 {
   token_data *ta = arg_token (argv, indexa);
   token_data *tb = arg_token (argv, indexb);
+  token_chain tmpa;
+  token_chain tmpb;
+  token_chain *ca = &tmpa;
+  token_chain *cb = &tmpb;
 
+  /* Quick tests.  */
   if (ta == &empty_token || tb == &empty_token)
     return ta == tb;
+  if (TOKEN_DATA_TYPE (ta) == TOKEN_TEXT
+      && TOKEN_DATA_TYPE (tb) == TOKEN_TEXT)
+    return (TOKEN_DATA_LEN (ta) == TOKEN_DATA_LEN (tb)
+           && memcmp (TOKEN_DATA_TEXT (ta), TOKEN_DATA_TEXT (tb),
+                      TOKEN_DATA_LEN (ta)) == 0);
+
+  /* Convert both arguments to chains, if not one already.  */
   /* TODO - allow builtin tokens in the comparison?  */
-  assert (TOKEN_DATA_TYPE (ta) == TOKEN_TEXT
-         && TOKEN_DATA_TYPE (tb) == TOKEN_TEXT);
-  return (TOKEN_DATA_LEN (ta) == TOKEN_DATA_LEN (tb)
-         && strcmp (TOKEN_DATA_TEXT (ta), TOKEN_DATA_TEXT (tb)) == 0);
+  if (TOKEN_DATA_TYPE (ta) == TOKEN_TEXT)
+    {
+      tmpa.next = NULL;
+      tmpa.str = TOKEN_DATA_TEXT (ta);
+      tmpa.len = TOKEN_DATA_LEN (ta);
+    }
+  else
+    {
+      assert (TOKEN_DATA_TYPE (ta) == TOKEN_COMP);
+      ca = ta->u.u_c.chain;
+    }
+  if (TOKEN_DATA_TYPE (tb) == TOKEN_TEXT)
+    {
+      tmpb.next = NULL;
+      tmpb.str = TOKEN_DATA_TEXT (tb);
+      tmpb.len = TOKEN_DATA_LEN (tb);
+    }
+  else
+    {
+      assert (TOKEN_DATA_TYPE (tb) == TOKEN_COMP);
+      cb = tb->u.u_c.chain;
+    }
+
+  /* Compare each link of the chain.  */
+  while (ca && cb)
+    {
+      /* TODO support comparison against $@ refs.  */
+      assert (ca->str && cb->str);
+      if (ca->len == cb->len)
+       {
+         if (memcmp (ca->str, cb->str, ca->len) != 0)
+           return false;
+         ca = ca->next;
+         cb = cb->next;
+       }
+      else if (ca->len < cb->len)
+       {
+         if (memcmp (ca->str, cb->str, ca->len) != 0)
+           return false;
+         tmpb.next = cb->next;
+         tmpb.str = cb->str + ca->len;
+         tmpb.len = cb->len - ca->len;
+         ca = ca->next;
+         cb = &tmpb;
+       }
+      else
+       {
+         assert (ca->len > cb->len);
+         if (memcmp (ca->str, cb->str, cb->len) != 0)
+           return false;
+         tmpa.next = ca->next;
+         tmpa.str = ca->str + cb->len;
+         tmpa.len = ca->len - cb->len;
+         ca = &tmpa;
+         cb = cb->next;
+       }
+    }
+
+  /* If we get this far, the two tokens are equal only if both chains
+     are exhausted.  */
+  assert (ca != cb || ca == NULL);
+  return ca == cb;
 }
 
 /* Given ARGV, return true if argument INDEX is the empty string.
@@ -830,6 +959,8 @@ size_t
 arg_len (macro_arguments *argv, unsigned int index)
 {
   token_data *token;
+  token_chain *chain;
+  size_t len;
 
   if (index == 0)
     return argv->argv0_len;
@@ -842,8 +973,18 @@ arg_len (macro_arguments *argv, unsigned int index)
       assert ((token == &empty_token) == (TOKEN_DATA_LEN (token) == 0));
       return TOKEN_DATA_LEN (token);
     case TOKEN_COMP:
-      /* TODO - how to concatenate multiple arguments?  For now, we expect
-        only one element in the chain, and arg_token dereferences it.  */
+      /* TODO - concatenate multiple arguments?  For now, we assume
+        all elements are text.  */
+      chain = token->u.u_c.chain;
+      len = 0;
+      while (chain)
+       {
+         assert (chain->str);
+         len += chain->len;
+         chain = chain->next;
+       }
+      assert (len);
+      return len;
     default:
       break;
     }
@@ -892,12 +1033,12 @@ make_argv_ref (macro_arguments *argv, const char *argv0, 
size_t argv0_len,
 
   /* When making a reference through a reference, point to the
      original if possible.  */
-  if (argv->has_ref)
+  if (argv->wrapper)
     {
       /* TODO - for now we support only a single-length $@ chain.  */
       assert (argv->arraylen == 1
              && TOKEN_DATA_TYPE (argv->array[0]) == TOKEN_COMP);
-      chain = argv->array[0]->u.chain;
+      chain = argv->array[0]->u.u_c.chain;
       assert (!chain->next && !chain->str);
       argv = chain->argv;
       index += chain->index - 1;
@@ -907,6 +1048,7 @@ make_argv_ref (macro_arguments *argv, const char *argv0, 
size_t argv0_len,
       new_argv = (macro_arguments *)
        obstack_alloc (obs, offsetof (macro_arguments, array));
       new_argv->arraylen = 0;
+      new_argv->wrapper = false;
       new_argv->has_ref = false;
     }
   else
@@ -918,10 +1060,12 @@ make_argv_ref (macro_arguments *argv, const char *argv0, 
size_t argv0_len,
       chain = (token_chain *) obstack_alloc (obs, sizeof *chain);
       new_argv->arraylen = 1;
       new_argv->array[0] = token;
+      new_argv->wrapper = true;
       new_argv->has_ref = true;
       TOKEN_DATA_TYPE (token) = TOKEN_COMP;
-      token->u.chain = chain;
+      token->u.u_c.chain = token->u.u_c.end = chain;
       chain->next = NULL;
+      chain->quote_age = argv->quote_age;
       chain->str = NULL;
       chain->len = 0;
       chain->level = expansion_level - 1;
@@ -955,9 +1099,23 @@ push_arg (struct obstack *obs, macro_arguments *argv, 
unsigned int index)
     return;
   token = arg_token (argv, index);
   /* TODO handle func tokens?  */
-  assert (TOKEN_DATA_TYPE (token) == TOKEN_TEXT);
-  if (push_token (token, expansion_level - 1))
-    arg_mark (argv);
+  if (TOKEN_DATA_TYPE (token) == TOKEN_TEXT)
+    {
+      if (push_token (token, expansion_level - 1))
+       arg_mark (argv);
+    }
+  else if (TOKEN_DATA_TYPE (token) == TOKEN_COMP)
+    {
+      /* TODO - concatenate multiple arguments?  For now, we assume
+        all elements are text.  */
+      token_chain *chain = token->u.u_c.chain;
+      while (chain)
+       {
+         assert (chain->str);
+         obstack_grow (obs, chain->str, chain->len);
+         chain = chain->next;
+       }
+    }
 }
 
 /* Push series of comma-separated arguments from ARGV, which should
@@ -968,6 +1126,7 @@ void
 push_args (struct obstack *obs, macro_arguments *argv, bool skip, bool quote)
 {
   token_data *token;
+  token_chain *chain;
   unsigned int i = skip ? 2 : 1;
   const char *sep = ",";
   size_t sep_len = 1;
@@ -1007,8 +1166,20 @@ push_args (struct obstack *obs, macro_arguments *argv, 
bool skip, bool quote)
       else
        use_sep = true;
       /* TODO handle func tokens?  */
-      assert (TOKEN_DATA_TYPE (token) == TOKEN_TEXT);
-      inuse |= push_token (token, expansion_level - 1);
+      if (TOKEN_DATA_TYPE (token) == TOKEN_TEXT)
+       inuse |= push_token (token, expansion_level - 1);
+      else
+       {
+         /* TODO - handle composite text in push_token.  */
+         assert (TOKEN_DATA_TYPE (token) == TOKEN_COMP);
+         chain = token->u.u_c.chain;
+         while (chain)
+           {
+             assert (chain->str);
+             obstack_grow (obs, chain->str, chain->len);
+             chain = chain->next;
+           }
+       }
     }
   if (quote)
     obstack_grow (obs, rquote.string, rquote.length);
-- 
1.5.3.8


reply via email to

[Prev in Thread] Current Thread [Next in Thread]