m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[14/18] argv_ref speedup: push entire $@ reference at once


From: Eric Blake
Subject: [14/18] argv_ref speedup: push entire $@ reference at once
Date: Sat, 02 Feb 2008 16:12:24 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071031 Thunderbird/2.0.0.9 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Next in the series.  Rather than pushing $@ one argument at a time, it is
more efficient to push a single reference around the entire argument list
at once (reusing the notion of an argv link added in stage 4).  For now,
this patch just makes the input engine merely blast the reference back
into one argument at a time.  As a result, the memory usage of unboxed
recursion regresses to quadratic (each new $@ argv ref occupies storage,
rather than sharing with the prior iteration).  However, this sets up the
framework for future patches to efficiently hand back $@ (or a subset
thereof) to the argument collection engine, which will restore the memory
usage back to linear.  On the master branch, I split things into two; one
patch to make printing $@ references easier, the other to actually push $@
references into the input engine (where the printing occurs when tracing
is enabled).

2008-02-02  Eric Blake  <address@hidden>

        Stage 14: allow pushing argv references.
        Push a $@ reference to the input engine in one go, rather than
        pushing each element.  For now, argument collection still gets one
        argument of a $@ at a time; but the penalties of this patch make
        it easier to manage $@ efficiently in future patches.
        Memory impact: noticeable penalty, due to larger struct and O(n)
        to O(n^2) on unboxed recursion
        Speed impact: noticeable penalty, due to more bookkeeping.
        * src/m4.h (struct token_chain): Add comma and quotes fields.
        (arg_adjust_refcount, arg_print, push_arg_quote): New prototypes.
        * src/input.c (push_token, pop_input, input_print, peek_input)
        (next_char_1): Support $@ references.
        * src/macro.c (struct macro_arguments): Add level field.  Match
        type of arraylen to argc.
        (collect_arguments): Populate new field.
        (expand_macro, make_argv_ref, push_arg): Factor...
        (arg_adjust_refcount, make_argv_ref_token, push_arg_quote):
        ...into these new methods.
        (arg_token): Add new parameter.
        (arg_print): New function.
        (arg_mark, arg_type, arg_text, arg_equal, arg_empty, arg_len)
        (arg_func, push_args): Adjust callers.
        * doc/m4.texinfo (Ifelse): Augment test.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHpPjY84KuGfSFAYARAvi3AKCauBq/zk6TTzY/KgE/VHHtajWCVQCeI2vB
//5YOP03N9pDb3SlJ09PKmQ=
=Qcbd
-----END PGP SIGNATURE-----
>From 89ceca3d1d57ea666822040018b5036d84c087cc Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Sat, 2 Feb 2008 07:33:34 -0700
Subject: [PATCH] Stage 14a: allow printing argv references.

* m4/m4module.h (m4_arg_print): New prototype.
(m4_symbol_value_print): Alter prototype.
* m4/input.c (struct input_funcs): Add parameter to peek_func.
(file_peek, builtin_peek, string_peek): Ignore new parameter.
(composite_peek): Ignore new parameter, for now.
(composite_clean, pop_input): Rework to minimize indirection, and
to avoid infinite recursion in next patch.
* m4/macro.c (trace_prepre, trace_pre): Adjust callers.
(m4_arg_print): New function.
* m4/symtab.c (m4_symbol_value_print): Update signature.
(m4_symbol_print): Update caller.
* m4/output.c (m4_shipout_string_trunc): Update comments.
* m4/syntax.c (set_quote_age): Require comma as argument separator
when dealing with $@ as a unit.
* tests/builtins.at (ifelse): Augment test.
* doc/m4.texinfo (Changesyntax): Document changesyntax deficiency.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog         |   31 ++++++++++++++-
 doc/m4.texinfo    |   20 +++++++++-
 ltdl/.cvsignore   |    1 +
 ltdl/.gitignore   |    1 +
 m4/input.c        |   36 ++++++++++--------
 m4/m4module.h     |    8 +++-
 m4/macro.c        |   40 ++++++++++++++++++-
 m4/output.c       |    2 +-
 m4/symtab.c       |  108 ++++++++++++++++++++++++++++------------------------
 m4/syntax.c       |   11 ++++-
 tests/builtins.at |   20 ++++++++++
 11 files changed, 200 insertions(+), 78 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 2ad8299..faf8c72 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,27 @@
+2008-02-02  Eric Blake  <address@hidden>
+
+       Stage 14a: allow printing argv references.
+       Refactor symbol-value printing code for better sharing, and to
+       allow printing a contiguous text representation of a $@ ref.
+       Memory impact: none.
+       Speed impact: none.
+       * m4/m4module.h (m4_arg_print): New prototype.
+       (m4_symbol_value_print): Alter prototype.
+       * m4/input.c (struct input_funcs): Add parameter to peek_func.
+       (file_peek, builtin_peek, string_peek): Ignore new parameter.
+       (composite_peek): Ignore new parameter, for now.
+       (composite_clean, pop_input): Rework to minimize indirection, and
+       to avoid infinite recursion in next patch.
+       * m4/macro.c (trace_prepre, trace_pre): Adjust callers.
+       (m4_arg_print): New function.
+       * m4/symtab.c (m4_symbol_value_print): Update signature.
+       (m4_symbol_print): Update caller.
+       * m4/output.c (m4_shipout_string_trunc): Update comments.
+       * m4/syntax.c (set_quote_age): Require comma as argument separator
+       when dealing with $@ as a unit.
+       * tests/builtins.at (ifelse): Augment test.
+       * doc/m4.texinfo (Changesyntax): Document changesyntax deficiency.
+
 2008-01-31  Eric Blake  <address@hidden>
 
        Kill hack for M4 1.4.4.
@@ -47,7 +71,7 @@
        reused through multiple macro expansions.  Add hueristic that
        avoids creating new reference when pushing existing references.
        Memory impact: noticeable improvement due to better reference
-       reuse, except for boxed recursion doing more copying.
+       reuse, except for O(n) to O(n^2) copying in boxed recursion.
        Speed impact: slight penalty, due to more bookkeeping.
        * m4/m4private.h (m4__push_symbol): Adjust prototype.
        * m4/input.c (m4__push_symbol): Add parameter, and support
@@ -154,7 +178,7 @@
        action, so that an argument can be reused throughout macro
        recursion if it remains unchanged.
        Memory impact: noticeable improvement, due to more reuse in
-       argument collection stacks.
+       argument collection stacks; O(n^2) to O(n) on boxed recursion.
        Speed impact: noticeable improvement, due to less copying.
        * m4/m4module.h (m4_arg_text): Add parameter.
        (M4ARG): Adjust.
@@ -233,7 +257,8 @@
        creating a FIFO link.  Also start testing embedded NUL behavior.
        Until the argument collection engine also shares references, the
        memory usage increases.
-       Memory impact: noticeable penalty, due to longer life of argv.
+       Memory impact: noticeable penalty, due to longer life of argv
+       changing O(n) to O(n^2) on boxed recursion.
        Speed impact: slight improvement, due less data copying.
        * ltdl/m4/gnulib-cache.m4: Import memmem and quote modules.
        * m4/m4module.h (m4_arg_scratch): New prototype.
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 5d87489..9e9dd46 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -4937,8 +4937,24 @@ Note how it is possible to have both long and short 
quotes, if
 The syntax table is initialized to be backwards compatible, so if you
 never call @code{changesyntax}, nothing will have changed.
 
-Debugging output continue to use @kbd{(}, @kbd{,} and @kbd{)} to show
-macro calls.
+For now, debugging output continues to use @kbd{(}, @kbd{,} and @kbd{)}
+to show macro calls; and macro expansions that result in a list of
+arguments (such as @samp{$@@} or @code{shift}) use @samp{,}, regardless
+of the current syntax settings.  However, this is likely to change in a
+future release, so it should not be relied on, particularly since it is
+next to impossible to write recursive macros if the argument separator
+doesn't match between expansion and rescanning.
+
address@hidden FIXME - changing syntax of , should not break iterative macros.
address@hidden
+$ @kbd{m4 -d}
+changesyntax(`,=|')traceon(`foo')define(`foo'|`$#:$@')
address@hidden
+foo(foo(1|2|3))
address@hidden: -2- foo(`1', `2', `3') -> `3:`1',`2',`3''
address@hidden: -1- foo(`3:1,2,3') -> `1:`3:1,2,3''
address@hidden:3:1,2,3
address@hidden example
 
 @node M4wrap
 @section Saving text until end of input
diff --git a/ltdl/.cvsignore b/ltdl/.cvsignore
index c2d1277..766a9f7 100644
--- a/ltdl/.cvsignore
+++ b/ltdl/.cvsignore
@@ -2,6 +2,7 @@
 *.la
 *.lo
 .deps
+.dirstamp
 .libs
 aclocal.m4
 argz.c
diff --git a/ltdl/.gitignore b/ltdl/.gitignore
index c2d1277..766a9f7 100644
--- a/ltdl/.gitignore
+++ b/ltdl/.gitignore
@@ -2,6 +2,7 @@
 *.la
 *.lo
 .deps
+.dirstamp
 .libs
 aclocal.m4
 argz.c
diff --git a/m4/input.c b/m4/input.c
index 9616d37..d14cbbd 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -92,20 +92,20 @@
    maintains its own notion of the current file and line, so swapping
    between input blocks must update the context accordingly.  */
 
-static int     file_peek               (m4_input_block *);
+static int     file_peek               (m4_input_block *, m4 *);
 static int     file_read               (m4_input_block *, m4 *, bool, bool);
 static void    file_unget              (m4_input_block *, int);
 static bool    file_clean              (m4_input_block *, m4 *, bool);
 static void    file_print              (m4_input_block *, m4 *, m4_obstack *);
-static int     builtin_peek            (m4_input_block *);
+static int     builtin_peek            (m4_input_block *, m4 *);
 static int     builtin_read            (m4_input_block *, m4 *, bool, bool);
 static void    builtin_unget           (m4_input_block *, int);
 static void    builtin_print           (m4_input_block *, m4 *, m4_obstack *);
-static int     string_peek             (m4_input_block *);
+static int     string_peek             (m4_input_block *, m4 *);
 static int     string_read             (m4_input_block *, m4 *, bool, bool);
 static void    string_unget            (m4_input_block *, int);
 static void    string_print            (m4_input_block *, m4 *, m4_obstack *);
-static int     composite_peek          (m4_input_block *);
+static int     composite_peek          (m4_input_block *, m4 *);
 static int     composite_read          (m4_input_block *, m4 *, bool, bool);
 static void    composite_unget         (m4_input_block *, int);
 static bool    composite_clean         (m4_input_block *, m4 *, bool);
@@ -130,7 +130,7 @@ struct input_funcs
 {
   /* Peek at input, return an unsigned char, CHAR_BUILTIN if it is a
      builtin, or CHAR_RETRY if none available.  */
-  int  (*peek_func)    (m4_input_block *);
+  int  (*peek_func)    (m4_input_block *, m4 *);
 
   /* Read input, return an unsigned char, CHAR_BUILTIN if it is a
      builtin, or CHAR_RETRY if none available.  If ALLOW_QUOTE, then
@@ -254,7 +254,7 @@ static struct input_funcs composite_funcs = {
 
 /* Input files, from command line or [s]include.  */
 static int
-file_peek (m4_input_block *me)
+file_peek (m4_input_block *me, m4 *context M4_GNUC_UNUSED)
 {
   int ch;
 
@@ -389,7 +389,7 @@ m4_push_file (m4 *context, FILE *fp, const char *title, 
bool close_file)
 
 /* Handle a builtin macro token.  */
 static int
-builtin_peek (m4_input_block *me)
+builtin_peek (m4_input_block *me, m4 *context M4_GNUC_UNUSED)
 {
   if (me->u.u_b.read)
     return CHAR_RETRY;
@@ -474,7 +474,7 @@ m4_push_builtin (m4 *context, m4_symbol_value *token)
 
 /* Handle string expansion text.  */
 static int
-string_peek (m4_input_block *me)
+string_peek (m4_input_block *me, m4 *context M4_GNUC_UNUSED)
 {
   return me->u.u_s.len ? to_uchar (*me->u.u_s.str) : CHAR_RETRY;
 }
@@ -537,8 +537,8 @@ m4_push_string_init (m4 *context)
    level, or SIZE_MAX if VALUE is composite, its contents reside
    entirely on the current_input stack, and VALUE lives in temporary
    storage.  If VALUE is a simple string, then it belongs to the
-   current macro expansion.  If VALUE is composit, then each text link
-   has a level of SIZE_MAX if it belongs to the current macro
+   current macro expansion.  If VALUE is composite, then each text
+   link has a level of SIZE_MAX if it belongs to the current macro
    expansion, otherwise it is a back-reference where level tracks
    which stack it came from.  The resulting input block chain contains
    links with a level of SIZE_MAX if the text belongs to the input
@@ -715,7 +715,7 @@ m4_push_string_finish (void)
    in FIFO order, even though the obstack allocates memory in LIFO
    order.  */
 static int
-composite_peek (m4_input_block *me)
+composite_peek (m4_input_block *me, m4 *context)
 {
   m4__symbol_chain *chain = me->u.u_c.chain;
   while (chain)
@@ -798,7 +798,11 @@ composite_clean (m4_input_block *me, m4 *context, bool 
cleanup)
       switch (chain->type)
        {
        case M4__CHAIN_STR:
-         assert (!chain->u.u_s.len);
+         if (chain->u.u_s.len)
+           {
+             assert (!cleanup);
+             return false;
+           }
          if (chain->u.u_s.level < SIZE_MAX)
            m4__adjust_refcount (context, chain->u.u_s.level, false);
          break;
@@ -935,9 +939,9 @@ pop_input (m4 *context, bool cleanup)
   m4_input_block *tmp = isp->prev;
 
   assert (isp);
-  if (isp->funcs->peek_func (isp) != CHAR_RETRY
-      || (isp->funcs->clean_func
-         && !isp->funcs->clean_func (isp, context, cleanup)))
+  if (isp->funcs->clean_func
+      ? !isp->funcs->clean_func (isp, context, cleanup)
+      : (isp->funcs->peek_func (isp, context) != CHAR_RETRY))
     return false;
 
   if (tmp != NULL)
@@ -1101,7 +1105,7 @@ peek_char (m4 *context)
        return CHAR_EOF;
 
       assert (block->funcs->peek_func);
-      if ((ch = block->funcs->peek_func (block)) != CHAR_RETRY)
+      if ((ch = block->funcs->peek_func (block, context)) != CHAR_RETRY)
        {
 /*       if (IS_IGNORE (ch)) */
 /*         return next_char (context, false, true); */
diff --git a/m4/m4module.h b/m4/m4module.h
index 24d6a45..77cfa52 100644
--- a/m4/m4module.h
+++ b/m4/m4module.h
@@ -246,8 +246,9 @@ extern m4_symbol_value *m4_get_symbol_value   (m4_symbol*);
 extern bool            m4_get_symbol_traced      (m4_symbol*);
 extern bool            m4_set_symbol_name_traced (m4_symbol_table*,
                                                   const char *, bool);
-extern void    m4_symbol_value_print   (m4_symbol_value *, m4_obstack *,
-                                        const m4_string_pair *, size_t, bool);
+extern bool    m4_symbol_value_print   (m4_symbol_value *, m4_obstack *,
+                                        const m4_string_pair *, size_t *,
+                                        bool);
 extern void    m4_symbol_print         (m4_symbol *, m4_obstack *,
                                         const m4_string_pair *, bool, size_t,
                                         bool);
@@ -326,6 +327,9 @@ extern bool m4_arg_empty            (m4_macro_args *, 
unsigned int);
 extern size_t  m4_arg_len              (m4_macro_args *, unsigned int);
 extern m4_builtin_func *m4_arg_func    (m4_macro_args *, unsigned int);
 extern m4_obstack *m4_arg_scratch      (m4 *);
+extern bool    m4_arg_print            (m4_obstack *, m4_macro_args *,
+                                        unsigned int, const m4_string_pair *,
+                                        size_t *, bool);
 extern m4_macro_args *m4_make_argv_ref (m4 *, m4_macro_args *, const char *,
                                         size_t, bool, bool);
 extern void    m4_push_arg             (m4 *, m4_obstack *, m4_macro_args *,
diff --git a/m4/macro.c b/m4/macro.c
index f91923c..bb6df16 100644
--- a/m4/macro.c
+++ b/m4/macro.c
@@ -896,7 +896,7 @@ trace_prepre (m4 *context, const char *name, size_t id, 
m4_symbol_value *value)
     quotes = m4_get_syntax_quotes (M4SYNTAX);
   trace_header (context, id);
   trace_format (context, "%s ... = ", name);
-  m4_symbol_value_print (value, &context->trace_messages, quotes, arg_length,
+  m4_symbol_value_print (value, &context->trace_messages, quotes, &arg_length,
                         module);
   trace_flush (context);
 }
@@ -923,11 +923,12 @@ trace_pre (m4 *context, size_t id, m4_macro_args *argv)
       trace_format (context, "(");
       for (i = 1; i < argc; i++)
        {
+         size_t len = arg_length;
          if (i != 1)
            trace_format (context, ", ");
 
          m4_symbol_value_print (m4_arg_symbol (argv, i),
-                                &context->trace_messages, quotes, arg_length,
+                                &context->trace_messages, quotes, &len,
                                 module);
        }
       trace_format (context, ")");
@@ -1235,6 +1236,41 @@ m4_arg_func (m4_macro_args *argv, unsigned int index)
   return m4_get_symbol_value_func (m4_arg_symbol (argv, index));
 }
 
+/* Dump a representation of ARGV to the obstack OBS, starting with
+   argument INDEX.  If QUOTES is non-NULL, each argument is displayed
+   with those quotes.  If MAX_LEN is non-NULL, truncate the output
+   after *MAX_LEN bytes are output and return true; otherwise, return
+   false, and reduce *MAX_LEN by the number of bytes output.  If
+   MODULE, print any details about originating modules.  QUOTES count
+   against the truncation length, but not module names.  */
+bool
+m4_arg_print (m4_obstack *obs, m4_macro_args *argv, unsigned int index,
+             const m4_string_pair *quotes, size_t *max_len, bool module)
+{
+  size_t len = max_len ? *max_len : SIZE_MAX;
+  unsigned int i;
+  bool comma = false;
+
+  for (i = index; i < argv->argc; i++)
+    {
+      if (comma && m4_shipout_string_trunc (obs, ",", 1, NULL, &len))
+       return true;
+      comma = true;
+      if (quotes && m4_shipout_string_trunc (obs, quotes->str1, quotes->len1,
+                                            NULL, &len))
+       return true;
+      if (m4_symbol_value_print (m4_arg_symbol (argv, i), obs, NULL, &len,
+                                module))
+       return true;
+      if (quotes && m4_shipout_string_trunc (obs, quotes->str2, quotes->len2,
+                                            NULL, &len))
+       return true;
+    }
+  if (max_len)
+    *max_len = len;
+  return false;
+}
+
 /* Create a new argument object using the same obstack as ARGV; thus,
    the new object will automatically be freed when the original is
    freed.  Explicitly set the macro name (argv[0]) from ARGV0 with
diff --git a/m4/output.c b/m4/output.c
index 21a28f7..d6c6cc5 100644
--- a/m4/output.c
+++ b/m4/output.c
@@ -604,7 +604,7 @@ m4_shipout_string (m4 *context, m4_obstack *obs, const char 
*s, size_t len,
    quote characters around S.  If LEN is SIZE_MAX, use the string
    length of S instead.  If MAX_LEN, reduce *MAX_LEN by LEN.  If LEN
    is larger than *MAX_LEN, then truncate output and return true;
-   otherwise return false.  */
+   otherwise return false.  Quotes do not count against MAX_LEN.  */
 bool
 m4_shipout_string_trunc (m4_obstack *obs, const char *s, size_t len,
                         const m4_string_pair *quotes, size_t *max_len)
diff --git a/m4/symtab.c b/m4/symtab.c
index 60afe63..f7a96ee 100644
--- a/m4/symtab.c
+++ b/m4/symtab.c
@@ -533,75 +533,77 @@ m4_set_symbol_name_traced (m4_symbol_table *symtab, const 
char *name,
 }
 
 /* Grow OBS with a text representation of VALUE.  If QUOTES, then use
-   it to surround a text definition.  If MAXLEN is less than SIZE_MAX,
-   then truncate text definitions to that length.  If MODULE, then
-   include which module defined a builtin.  */
-void
+   it to surround a text definition.  If MAXLEN, then truncate text
+   definitions to *MAXLEN, and adjust by how many characters are
+   printed.  If MODULE, then include which module defined a builtin.
+   Return true if the output was truncated.  QUOTES and MODULE do not
+   count against the truncation length.  */
+bool
 m4_symbol_value_print (m4_symbol_value *value, m4_obstack *obs,
-                      const m4_string_pair *quotes, size_t maxlen,
+                      const m4_string_pair *quotes, size_t *maxlen,
                       bool module)
 {
   const char *text;
-  size_t len;
-  bool truncated = false;
+  const m4_builtin *bp;
+  m4__symbol_chain *chain;
+  size_t len = maxlen ? *maxlen : SIZE_MAX;
+  bool result = false;
 
   switch (value->type)
     {
     case M4_SYMBOL_TEXT:
-      text = m4_get_symbol_value_text (value);
-      len = m4_get_symbol_value_len (value);
-      if (maxlen < len)
-       {
-         len = maxlen;
-         truncated = true;
-       }
+      if (m4_shipout_string_trunc (obs, m4_get_symbol_value_text (value),
+                                  m4_get_symbol_value_len (value), quotes,
+                                  &len))
+       result = true;
       break;
     case M4_SYMBOL_FUNC:
-      {
-       const m4_builtin *bp = m4_get_symbol_value_builtin (value);
-       static const m4_string_pair q1 = { "<", 1, ">", 1 };
-       text = bp->name;
-       len = strlen (text);
-       quotes = &q1;
-      }
+      bp = m4_get_symbol_value_builtin (value);
+      obstack_1grow (obs, '<');
+      obstack_grow (obs, bp->name, strlen (bp->name));
+      obstack_1grow (obs, '>');
       break;
     case M4_SYMBOL_PLACEHOLDER:
       text = m4_get_symbol_value_placeholder (value);
-      static const m4_string_pair q2 = { "<<", 2, ">>", 2 };
-      len = strlen (text);
-      quotes = &q2;
+      obstack_1grow (obs, '<');
+      obstack_1grow (obs, '<');
+      obstack_grow (obs, text, strlen (text));
+      obstack_1grow (obs, '>');
+      obstack_1grow (obs, '>');
       break;
     case M4_SYMBOL_COMP:
-      {
-       m4__symbol_chain *chain = value->u.u_c.chain;
-       if (quotes)
-         obstack_grow (obs, quotes->str1, quotes->len1);
-       while (chain)
-         {
-           /* TODO for now, assume all links are text.  */
-           assert (chain->type == M4__CHAIN_STR);
-           if (m4_shipout_string_trunc (obs, chain->u.u_s.str,
-                                        chain->u.u_s.len, NULL, &maxlen))
+      chain = value->u.u_c.chain;
+      if (quotes)
+       obstack_grow (obs, quotes->str1, quotes->len1);
+      while (chain && !result)
+       {
+         switch (chain->type)
+           {
+           case M4__CHAIN_STR:
+             if (m4_shipout_string_trunc (obs, chain->u.u_s.str,
+                                          chain->u.u_s.len, NULL, &len))
+               result = true;
+             break;
+           case M4__CHAIN_ARGV:
+             if (m4_arg_print (obs, chain->u.u_a.argv, chain->u.u_a.index,
+                               NULL, &len, module))
+               result = true;
              break;
+           default:
+             assert (!"m4_symbol_value_print");
+             abort ();
+           }
            chain = chain->next;
          }
-       if (quotes)
-         obstack_grow (obs, quotes->str2, quotes->len2);
-       assert (!module);
-       return;
-      }
+      if (quotes)
+       obstack_grow (obs, quotes->str2, quotes->len2);
+      assert (!module);
+      break;
     default:
-      assert (!"invalid token in symbol_value_print");
+      assert (!"m4_symbol_value_print");
       abort ();
     }
 
-  if (quotes)
-    obstack_grow (obs, quotes->str1, quotes->len1);
-  obstack_grow (obs, text, len);
-  if (truncated)
-    obstack_grow (obs, "...", 3);
-  if (quotes)
-    obstack_grow (obs, quotes->str2, quotes->len2);
   if (module && VALUE_MODULE (value))
     {
       obstack_1grow (obs, '{');
@@ -609,25 +611,30 @@ m4_symbol_value_print (m4_symbol_value *value, m4_obstack 
*obs,
       obstack_grow (obs, text, strlen (text));
       obstack_1grow (obs, '}');
     }
+  if (maxlen)
+    *maxlen = len;
+  return result;
 }
 
 /* Grow OBS with a text representation of SYMBOL.  If QUOTES, then use
    it to surround each text definition.  If STACK, then append all
    pushdef'd values, rather than just the top.  If ARG_LENGTH is less
    than SIZE_MAX, then truncate text definitions to that length.  If
-   MODULE, then include which module defined a builtin.  */
+   MODULE, then include which module defined a builtin.  QUOTES and
+   MODULE do not count toward truncation.  */
 void
 m4_symbol_print (m4_symbol *symbol, m4_obstack *obs,
                 const m4_string_pair *quotes, bool stack, size_t arg_length,
                 bool module)
 {
   m4_symbol_value *value;
+  size_t len = arg_length;
 
   assert (symbol);
   assert (obs);
 
   value = m4_get_symbol_value (symbol);
-  m4_symbol_value_print (value, obs, quotes, arg_length, module);
+  m4_symbol_value_print (value, obs, quotes, &len, module);
   if (stack)
     {
       value = VALUE_NEXT (value);
@@ -635,7 +642,8 @@ m4_symbol_print (m4_symbol *symbol, m4_obstack *obs,
        {
          obstack_1grow (obs, ',');
          obstack_1grow (obs, ' ');
-         m4_symbol_value_print (value, obs, quotes, arg_length, module);
+         len = arg_length;
+         m4_symbol_value_print (value, obs, quotes, &len, module);
          value = VALUE_NEXT (value);
        }
     }
diff --git a/m4/syntax.c b/m4/syntax.c
index c39a9be..aff6444 100644
--- a/m4/syntax.c
+++ b/m4/syntax.c
@@ -709,7 +709,13 @@ set_quote_age (m4_syntax_table *syntax, bool reset, bool 
change)
    bits of quote_age; otherwise we increment syntax_age for each
    changesyntax, but saturate it at 0xffff rather than wrapping
    around.  Perhaps a cache of other frequently used states is
-   warranted, if changesyntax becomes more popular
+   warranted, if changesyntax becomes more popular.
+
+   Perhaps someday we will fix $@ expansion to use the current
+   settings of the comma category, or even allow multi-character
+   argument separators via changesyntax.  Until then, we use a literal
+   `,' in $@ expansion, therefore we must insist that `,' be an
+   argument separator for quote_age to be non-zero.
 
    Rather than check every token for an unquoted delimiter, we merely
    encode current_quote_age to 0 when things are unsafe, and non-zero
@@ -739,7 +745,8 @@ set_quote_age (m4_syntax_table *syntax, bool reset, bool 
change)
       && *syntax->quote.str1 != *syntax->quote.str2
       && *syntax->comm.str1 != *syntax->quote.str2
       && !m4_has_syntax (syntax, *syntax->comm.str1,
-                        M4_SYNTAX_OPEN | M4_SYNTAX_COMMA | M4_SYNTAX_CLOSE))
+                        M4_SYNTAX_OPEN | M4_SYNTAX_COMMA | M4_SYNTAX_CLOSE)
+      && m4_has_syntax (syntax, ',', M4_SYNTAX_COMMA))
     {
       syntax->quote_age = ((local_syntax_age << 16)
                           | ((*syntax->quote.str1 & 0xff) << 8)
diff --git a/tests/builtins.at b/tests/builtins.at
index b8d5386..68d151c 100644
--- a/tests/builtins.at
+++ b/tests/builtins.at
@@ -433,14 +433,17 @@ AT_CLEANUP
 AT_TEST_M4([ifelse],
 dnl ensure that comparisons work regardless of reference chains in the middle
 [[define(`e', `$@')define(`long', `01234567890123456789')
+dnl in isolation
 ifelse(long, `01234567890123456789', `yes', `no')
 ifelse(`01234567890123456789', long, `yes', `no')
 ifelse(long, `01234567890123456789-', `yes', `no')
 ifelse(`01234567890123456789-', long, `yes', `no')
+dnl through macro expansion
 ifelse(e(long), `01234567890123456789', `yes', `no')
 ifelse(`01234567890123456789', e(long), `yes', `no')
 ifelse(e(long), `01234567890123456789-', `yes', `no')
 ifelse(`01234567890123456789-', e(long), `yes', `no')
+dnl concatenate macro expansion with unquoted characters
 ifelse(-e(long), `-01234567890123456789', `yes', `no')
 ifelse(-`01234567890123456789', -e(long), `yes', `no')
 ifelse(-e(long), `-01234567890123456789-', `yes', `no')
@@ -449,6 +452,15 @@ ifelse(-e(long)-, `-01234567890123456789-', `yes', `no')
 ifelse(-`01234567890123456789-', -e(long)-, `yes', `no')
 ifelse(-e(long)-, `-01234567890123456789', `yes', `no')
 ifelse(`-01234567890123456789', -e(long)-, `yes', `no')
+dnl concatenate macro expansion with quoted characters
+ifelse(`-'e(long), `-01234567890123456789', `yes', `no')
+ifelse(-`01234567890123456789', `-'e(long), `yes', `no')
+ifelse(`-'e(long), `-01234567890123456789-', `yes', `no')
+ifelse(`-01234567890123456789-', `-'e(long), `yes', `no')
+ifelse(`-'e(long)`-', `-01234567890123456789-', `yes', `no')
+ifelse(-`01234567890123456789-', `-'e(long)`-', `yes', `no')
+ifelse(`-'e(long)`-', `-01234567890123456789', `yes', `no')
+ifelse(`-01234567890123456789', `-'e(long)`-', `yes', `no')
 ]], [[
 yes
 yes
@@ -466,6 +478,14 @@ yes
 yes
 no
 no
+yes
+yes
+no
+no
+yes
+yes
+no
+no
 ]])
 
 
-- 
1.5.3.8


>From 7fe816278fe35846cf4f02e8ca38e050fd10506c Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Sat, 2 Feb 2008 07:34:08 -0700
Subject: [PATCH] Stage 14b: allow pushing argv references.

* m4/m4private.h (struct m4__symbol_chain): Add comma and quotes
fields.
(struct m4_macro_args): Add level field.
(m4__arg_adjust_refcount, m4__push_arg_quote): New prototypes.
* m4/input.c (m4__push_symbol, composite_peek, composite_read)
(composite_unget, composite_clean, composite_print): Support $@
refs.
* m4/macro.c (collect_arguments): Populate new field.
(expand_macro): Move argv cleanup...
(m4__arg_adjust_refcount): ...to this new function.
(m4_arg_symbol, m4_make_argv_ref, m4_push_arg): Factor...
(arg_symbol, make_argv_ref, m4__push_arg_quote): ...to these new
helper functions, to add parameters.
(m4_push_args): Adjust caller.
* m4/symtab.c (m4_symbol_value_print): Likewise.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog      |   24 ++++++
 m4/input.c     |   73 +++++++++++++++--
 m4/m4private.h |   14 +++-
 m4/macro.c     |  248 +++++++++++++++++++++++++++++++++++---------------------
 m4/symtab.c    |    2 +-
 5 files changed, 255 insertions(+), 106 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index faf8c72..a732959 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,29 @@
 2008-02-02  Eric Blake  <address@hidden>
 
+       Stage 14b: allow pushing argv references.
+       Push a $@ reference to the input engine in one go, rather than
+       pushing each element.  For now, argument collection still gets one
+       argument of a $@ at a time; but the penalties of this patch make
+       it easier to manage $@ efficiently in future patches.
+       Memory impact: noticeable penalty, due to larger struct and O(n)
+       to O(n^2) on unboxed recursion
+       Speed impact: noticeable penalty, due to more bookkeeping.
+       * m4/m4private.h (struct m4__symbol_chain): Add comma and quotes
+       fields.
+       (struct m4_macro_args): Add level field.
+       (m4__arg_adjust_refcount, m4__push_arg_quote): New prototypes.
+       * m4/input.c (m4__push_symbol, composite_peek, composite_read)
+       (composite_unget, composite_clean, composite_print): Support $@
+       refs.
+       * m4/macro.c (collect_arguments): Populate new field.
+       (expand_macro): Move argv cleanup...
+       (m4__arg_adjust_refcount): ...to this new function.
+       (m4_arg_symbol, m4_make_argv_ref, m4_push_arg): Factor...
+       (arg_symbol, make_argv_ref, m4__push_arg_quote): ...to these new
+       helper functions, to add parameters.
+       (m4_push_args): Adjust caller.
+       * m4/symtab.c (m4_symbol_value_print): Likewise.
+
        Stage 14a: allow printing argv references.
        Refactor symbol-value printing code for better sharing, and to
        allow printing a contiguous text representation of a $@ ref.
diff --git a/m4/input.c b/m4/input.c
index d14cbbd..1815009 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -556,7 +556,6 @@ m4__push_symbol (m4 *context, m4_symbol_value *value, 
size_t level, bool inuse)
   m4__symbol_chain *chain;
 
   assert (next);
-  /* TODO - also accept composite chains with $@ refs.  */
 
   /* Speed consideration - for short enough symbols, the speed and
      memory overhead of parsing another INPUT_CHAIN link outweighs the
@@ -661,8 +660,12 @@ m4__push_symbol (m4 *context, m4_symbol_value *value, 
size_t level, bool inuse)
       else
        next->u.u_c.chain = chain;
       next->u.u_c.end = chain;
-      assert (chain->type == M4__CHAIN_STR);
-      if (chain->u.u_s.level < SIZE_MAX)
+      if (chain->type == M4__CHAIN_ARGV)
+       {
+         assert (!chain->u.u_a.comma);
+         inuse |= m4__arg_adjust_refcount (context, chain->u.u_a.argv, true);
+       }
+      else if (chain->type == M4__CHAIN_STR && chain->u.u_s.level < SIZE_MAX)
        m4__adjust_refcount (context, chain->u.u_s.level, true);
       src_chain = src_chain->next;
     }
@@ -727,7 +730,22 @@ composite_peek (m4_input_block *me, m4 *context)
            return to_uchar (chain->u.u_s.str[0]);
          break;
        case M4__CHAIN_ARGV:
-         /* TODO - peek into argv.  */
+         /* TODO - figure out how to pass multiple arguments to
+            macro.c at once.  */
+         if (chain->u.u_a.index == m4_arg_argc (chain->u.u_a.argv))
+           break;
+         if (chain->u.u_a.comma)
+           return ','; /* FIXME - support M4_SYNTAX_COMMA.  */
+         /* Rather than directly parse argv here, we push another
+            input block containing the next unparsed argument from
+            argv.  */
+         m4_push_string_init (context);
+         m4__push_arg_quote (context, current_input, chain->u.u_a.argv,
+                             chain->u.u_a.index, chain->u.u_a.quotes);
+         chain->u.u_a.index++;
+         chain->u.u_a.comma = true;
+         m4_push_string_finish ();
+         return peek_char (context);
        default:
          assert (!"composite_peek");
          abort ();
@@ -743,7 +761,9 @@ composite_read (m4_input_block *me, m4 *context, bool 
allow_quote, bool safe)
   m4__symbol_chain *chain = me->u.u_c.chain;
   while (chain)
     {
-      if (allow_quote && chain->quote_age == m4__quote_age (M4SYNTAX))
+      /* TODO also support returning $@ as CHAR_QUOTE.  */
+      if (allow_quote && chain->quote_age == m4__quote_age (M4SYNTAX)
+         && chain->type == M4__CHAIN_STR)
        return CHAR_QUOTE;
       switch (chain->type)
        {
@@ -759,7 +779,28 @@ composite_read (m4_input_block *me, m4 *context, bool 
allow_quote, bool safe)
            m4__adjust_refcount (context, chain->u.u_s.level, false);
          break;
        case M4__CHAIN_ARGV:
-         /* TODO - peek into argv.  */
+         /* TODO - figure out how to pass multiple arguments to
+            macro.c at once.  */
+         if (chain->u.u_a.index == m4_arg_argc (chain->u.u_a.argv))
+           {
+             m4__arg_adjust_refcount (context, chain->u.u_a.argv, false);
+             break;
+           }
+         if (chain->u.u_a.comma)
+           {
+             chain->u.u_a.comma = false;
+             return ','; /* FIXME - support M4_SYNTAX_COMMA.  */
+           }
+         /* Rather than directly parse argv here, we push another
+            input block containing the next unparsed argument from
+            argv.  */
+         m4_push_string_init (context);
+         m4__push_arg_quote (context, current_input, chain->u.u_a.argv,
+                             chain->u.u_a.index, chain->u.u_a.quotes);
+         chain->u.u_a.index++;
+         chain->u.u_a.comma = true;
+         m4_push_string_finish ();
+         return next_char (context, allow_quote, !safe);
        default:
          assert (!"composite_read");
          abort ();
@@ -781,7 +822,10 @@ composite_unget (m4_input_block *me, int ch)
       chain->u.u_s.len++;
       break;
     case M4__CHAIN_ARGV:
-      /* TODO support argv ref.  */
+      /* FIXME - support M4_SYNTAX_COMMA.  */
+      assert (ch == ',' && !chain->u.u_a.comma);
+      chain->u.u_a.comma = true;
+      break;
     default:
       assert (!"composite_unget");
       abort ();
@@ -807,7 +851,13 @@ composite_clean (m4_input_block *me, m4 *context, bool 
cleanup)
            m4__adjust_refcount (context, chain->u.u_s.level, false);
          break;
        case M4__CHAIN_ARGV:
-         /* TODO - peek into argv.  */
+         if (chain->u.u_a.index < m4_arg_argc (chain->u.u_a.argv))
+           {
+             assert (!cleanup);
+             return false;
+           }
+         m4__arg_adjust_refcount (context, chain->u.u_a.argv, false);
+         break;
        default:
          assert (!"composite_clean");
          abort ();
@@ -824,6 +874,7 @@ composite_print (m4_input_block *me, m4 *context, 
m4_obstack *obs)
   size_t maxlen = m4_get_max_debug_arg_length_opt (context);
   m4__symbol_chain *chain = me->u.u_c.chain;
   const m4_string_pair *quotes = m4_get_syntax_quotes (M4SYNTAX);
+  bool module = m4_is_debug_bit (context, M4_DEBUG_TRACE_MODULE);
   bool done = false;
 
   if (quote)
@@ -838,7 +889,11 @@ composite_print (m4_input_block *me, m4 *context, 
m4_obstack *obs)
            done = true;
          break;
        case M4__CHAIN_ARGV:
-         /* TODO support argv refs as well.  */
+         assert (!chain->u.u_a.comma);
+         if (m4_arg_print (obs, chain->u.u_a.argv, chain->u.u_a.index,
+                           chain->u.u_a.quotes, &maxlen, module))
+           done = true;
+         break;
        default:
          assert (!"composite_print");
          abort ();
diff --git a/m4/m4private.h b/m4/m4private.h
index 5304682..1ce5813 100644
--- a/m4/m4private.h
+++ b/m4/m4private.h
@@ -217,9 +217,11 @@ struct m4__symbol_chain
     } u_s;                     /* M4__CHAIN_STR.  */
     struct
     {
-      m4_macro_args *argv;     /* Reference to earlier address@hidden  */
-      unsigned int index;      /* Argument index within argv.  */
-      bool flatten;            /* True to treat builtins as text.  */
+      m4_macro_args *argv;             /* Reference to earlier address@hidden  
*/
+      unsigned int index;              /* Argument index within argv.  */
+      bool_bitfield flatten : 1;       /* True to treat builtins as text.  */
+      bool_bitfield comma : 1;         /* True when `,' is next input.  */
+      const m4_string_pair *quotes;    /* NULL for $*, quotes for 
address@hidden  */
     } u_a;                     /* M4__CHAIN_ARGV.  */
   } u;
 };
@@ -281,6 +283,7 @@ struct m4_macro_args
      during parsing or any token is potentially unsafe and requires a
      rescan.  */
   unsigned int quote_age;
+  size_t level; /* Which obstack owns this argv.  */
   size_t arraylen; /* True length of allocated elements in array.  */
   /* Used as a variable-length array, storing information about each
      argument.  */
@@ -299,7 +302,10 @@ struct m4__macro_arg_stacks
   void *argv_base;     /* Location for clearing the argv obstack.  */
 };
 
-extern size_t m4__adjust_refcount (m4 *, size_t, bool);
+extern size_t  m4__adjust_refcount     (m4 *, size_t, bool);
+extern bool    m4__arg_adjust_refcount (m4 *, m4_macro_args *, bool);
+extern void    m4__push_arg_quote      (m4 *, m4_obstack *, m4_macro_args *,
+                                        unsigned int, const m4_string_pair *);
 
 #define VALUE_NEXT(T)          ((T)->next)
 #define VALUE_MODULE(T)                ((T)->module)
diff --git a/m4/macro.c b/m4/macro.c
index bb6df16..29c8c1b 100644
--- a/m4/macro.c
+++ b/m4/macro.c
@@ -522,31 +522,13 @@ recursion limit of %zu exceeded, use -L<N> to change it"),
   if (BIT_TEST (VALUE_FLAGS (value), VALUE_DELETED_BIT))
     m4_symbol_value_delete (value);
 
-  /* If argv contains references, those refcounts must be reduced now.  */
-  if (argv->has_ref)
-    {
-      m4__symbol_chain *chain;
-      size_t i;
-      for (i = 0; i < argv->arraylen; i++)
-       if (argv->array[i]->type == M4_SYMBOL_COMP)
-         {
-           chain = argv->array[i]->u.u_c.chain;
-           while (chain)
-             {
-               assert (chain->type == M4__CHAIN_STR);
-               if (chain->u.u_s.level < SIZE_MAX)
-                 m4__adjust_refcount (context, chain->u.u_s.level, false);
-               chain = chain->next;
-             }
-         }
-    }
-
   /* We no longer need argv, so reduce the refcount.  Additionally, if
      no other references to argv were created, we can free our portion
      of the obstack, although we must leave earlier content alone.  A
      refcount of 0 implies that adjust_refcount already freed the
      entire stack.  */
-  if (m4__adjust_refcount (context, level, false))
+  m4__arg_adjust_refcount (context, argv, false);
+  if (stack->refcount)
     {
       if (argv->inuse)
        {
@@ -593,6 +575,7 @@ collect_arguments (m4 *context, const char *name, size_t 
len,
   args.argv0 = (char *) obstack_copy0 (arguments, name, len);
   args.argv0_len = len;
   args.quote_age = m4__quote_age (M4SYNTAX);
+  args.level = context->expansion_level - 1;
   args.arraylen = 0;
   obstack_grow (argv_stack, &args, offsetof (m4_macro_args, array));
   name = args.argv0;
@@ -981,6 +964,33 @@ m4__adjust_refcount (m4 *context, size_t level, bool 
increase)
   return stack->refcount;
 }
 
+/* Given ARGV, adjust the refcount of every reference it contains in
+   the direction decided by INCREASE.  Return true if increasing
+   references to ARGV implies the first use of ARGV.  */
+bool
+m4__arg_adjust_refcount (m4 *context, m4_macro_args *argv, bool increase)
+{
+  size_t i;
+  m4__symbol_chain *chain;
+  bool result = !argv->inuse;
+
+  if (argv->has_ref)
+    for (i = 0; i < argv->arraylen; i++)
+      if (argv->array[i]->type == M4_SYMBOL_COMP)
+       {
+         chain = argv->array[i]->u.u_c.chain;
+         while (chain)
+           {
+             assert (chain->type == M4__CHAIN_STR);
+             if (chain->u.u_s.level < SIZE_MAX)
+               m4__adjust_refcount (context, chain->u.u_s.level, increase);
+             chain = chain->next;
+           }
+       }
+  m4__adjust_refcount (context, argv->level, increase);
+  return result;
+}
+
 /* Mark ARGV as being in use, along with any $@ references that it
    wraps.  */
 static void
@@ -998,20 +1008,74 @@ arg_mark (m4_macro_args *argv)
     }
 }
 
+/* Populate the newly-allocated VALUE as a wrapper around ARGV,
+   starting with argument INDEX.  Allocate any data on OBS, owned by a
+   given expansion LEVEL.  FLATTEN determines whether to allow
+   builtins, and QUOTES determines whether all arguments are quoted.
+   Return TOKEN when successful, NULL when wrapping ARGV is trivially
+   empty.  */
+static m4_symbol_value *
+make_argv_ref (m4_symbol_value *value, m4_obstack *obs, size_t level,
+              m4_macro_args *argv, unsigned int index, bool flatten,
+              const m4_string_pair *quotes)
+{
+  m4__symbol_chain *chain;
+
+  assert (obstack_object_size (obs) == 0);
+  if (argv->wrapper)
+    {
+      /* TODO support concatenation with $@ refs.  */
+      assert (argv->arraylen == 1 && argv->array[0]->type == M4_SYMBOL_COMP);
+      chain= argv->array[0]->u.u_c.chain;
+      assert (!chain->next && chain->type == M4__CHAIN_ARGV);
+      argv = chain->u.u_a.argv;
+      index += chain->u.u_a.index - 1;
+    }
+  if (argv->argc <= index)
+    return NULL;
+
+  chain = (m4__symbol_chain *) obstack_alloc (obs, sizeof *chain);
+  value->type = M4_SYMBOL_COMP;
+  value->u.u_c.chain = value->u.u_c.end = chain;
+  chain->next = NULL;
+  chain->type = M4__CHAIN_ARGV;
+  chain->quote_age = argv->quote_age;
+  chain->u.u_a.argv = argv;
+  chain->u.u_a.index = index;
+  chain->u.u_a.flatten = flatten;
+  chain->u.u_a.comma = false;
+  if (quotes)
+    {
+      /* Clone the quotes into the obstack, since changequote can
+        occur before this $@ is rescanned.  */
+      /* TODO - optimize when quote_age is nonzero?  */
+      m4_string_pair *tmp = (m4_string_pair *) obstack_copy (obs, quotes,
+                                                            sizeof *quotes);
+      tmp->str1 = (char *) obstack_copy0 (obs, quotes->str1, quotes->len1);
+      tmp->str2 = (char *) obstack_copy0 (obs, quotes->str2, quotes->len2);
+      chain->u.u_a.quotes = tmp;
+    }
+  else
+    chain->u.u_a.quotes = NULL;
+  return value;
+}
+
 /* Given ARGV, return the symbol value at the specified INDEX, which
-   must be non-zero.  */
-m4_symbol_value *
-m4_arg_symbol (m4_macro_args *argv, unsigned int index)
+   must be non-zero.  *LEVEL is set to the obstack level that contains
+   the symbol (which is not necessarily the level of ARGV).  */
+static m4_symbol_value *
+arg_symbol (m4_macro_args *argv, unsigned int index, size_t *level)
 {
   unsigned int i;
   m4_symbol_value *value;
 
   assert (index);
+  *level = argv->level;
   if (argv->argc <= index)
     return &empty_symbol;
-
   if (!argv->wrapper)
     return argv->array[index - 1];
+
   /* Must cycle through all array slots until we find index, since
      wrappers can contain multiple arguments.  */
   for (i = 0; i < argv->arraylen; i++)
@@ -1024,8 +1088,8 @@ m4_arg_symbol (m4_macro_args *argv, unsigned int index)
          assert (!chain->next && chain->type == M4__CHAIN_ARGV);
          if (index < chain->u.u_a.argv->argc - (chain->u.u_a.index - 1))
            {
-             value = m4_arg_symbol (chain->u.u_a.argv,
-                                    chain->u.u_a.index - 1 + index);
+             value = arg_symbol (chain->u.u_a.argv,
+                                 chain->u.u_a.index - 1 + index, level);
              if (chain->u.u_a.flatten && m4_is_symbol_value_func (value))
                value = &empty_symbol;
              break;
@@ -1038,6 +1102,15 @@ m4_arg_symbol (m4_macro_args *argv, unsigned int index)
   return value;
 }
 
+/* Given ARGV, return the symbol value at the specified INDEX, which
+   must be non-zero.  */
+m4_symbol_value *
+m4_arg_symbol (m4_macro_args *argv, unsigned int index)
+{
+  size_t dummy;
+  return arg_symbol (argv, index, &dummy);
+}
+
 /* Given ARGV, return true if argument INDEX is text.  Index 0 is
    always text, as are indices beyond argc.  */
 bool
@@ -1284,23 +1357,16 @@ m4_make_argv_ref (m4 *context, m4_macro_args *argv, 
const char *argv0,
 {
   m4_macro_args *new_argv;
   m4_symbol_value *value;
-  m4__symbol_chain *chain;
+  m4_symbol_value *new_value;
   unsigned int index = skip ? 2 : 1;
   m4_obstack *obs = m4_arg_scratch (context);
 
-  /* When making a reference through a reference, point to the
-     original if possible.  */
-  if (argv->wrapper)
-    {
-      /* TODO for now we support only a single-length $@ chain.  */
-      assert (argv->arraylen == 1 && argv->array[0]->type == M4_SYMBOL_COMP);
-      chain = argv->array[0]->u.u_c.chain;
-      assert (!chain->next && chain->type == M4__CHAIN_ARGV);
-      argv = chain->u.u_a.argv;
-      index += chain->u.u_a.index - 1;
-    }
-  if (argv->argc <= index)
+  new_value = (m4_symbol_value *) obstack_alloc (obs, sizeof *value);
+  value = make_argv_ref (new_value, obs, context->expansion_level - 1, argv,
+                        index, flatten, NULL);
+  if (!value)
     {
+      obstack_free (obs, new_value);
       new_argv = (m4_macro_args *) obstack_alloc (obs, offsetof (m4_macro_args,
                                                                 array));
       new_argv->arraylen = 0;
@@ -1311,26 +1377,17 @@ m4_make_argv_ref (m4 *context, m4_macro_args *argv, 
const char *argv0,
       new_argv = (m4_macro_args *) obstack_alloc (obs, (offsetof 
(m4_macro_args,
                                                                  array)
                                                        + sizeof value));
-      value = (m4_symbol_value *) obstack_alloc (obs, sizeof *value);
-      chain = (m4__symbol_chain *) obstack_alloc (obs, sizeof *chain);
       new_argv->arraylen = 1;
       new_argv->array[0] = value;
       new_argv->wrapper = true;
-      new_argv->has_ref = true;
-      value->type = M4_SYMBOL_COMP;
-      value->u.u_c.chain = value->u.u_c.end = chain;
-      chain->next = NULL;
-      chain->type = M4__CHAIN_ARGV;
-      chain->quote_age = argv->quote_age;
-      chain->u.u_a.argv = argv;
-      chain->u.u_a.index = index;
-      chain->u.u_a.flatten = flatten;
+      new_argv->has_ref = argv->has_ref;
     }
   new_argv->argc = argv->argc - (index - 1);
   new_argv->inuse = false;
   new_argv->argv0 = argv0;
   new_argv->argv0_len = argv0_len;
   new_argv->quote_age = argv->quote_age;
+  new_argv->level = argv->level;
   return new_argv;
 }
 
@@ -1340,24 +1397,38 @@ void
 m4_push_arg (m4 *context, m4_obstack *obs, m4_macro_args *argv,
             unsigned int index)
 {
-  m4_symbol_value *value;
-  m4_symbol_value temp;
+  m4_symbol_value value;
 
   if (index == 0)
     {
-      value = &temp;
-      m4_set_symbol_value_text (value, argv->argv0, argv->argv0_len, 0);
+      m4_set_symbol_value_text (&value, argv->argv0, argv->argv0_len, 0);
+      if (m4__push_symbol (context, &value, context->expansion_level - 1,
+                          argv->inuse))
+       arg_mark (argv);
     }
   else
-    {
-      value = m4_arg_symbol (argv, index);
-      if (value == &empty_symbol)
-       return;
-    }
+    m4__push_arg_quote (context, obs, argv, index, NULL);
+}
+
+/* Push argument INDEX from ARGV, which must be a text token, onto the
+   expansion stack OBS for rescanning.  INDEX must be non-zero.
+   QUOTES determines any quote delimiters that were in effect when the
+   reference was created.  */
+void
+m4__push_arg_quote (m4 *context, m4_obstack *obs, m4_macro_args *argv,
+                   unsigned int index, const m4_string_pair *quotes)
+{
+  size_t level;
+  m4_symbol_value *value = arg_symbol (argv, index, &level);
+
   /* TODO handle builtin tokens?  */
-  if (m4__push_symbol (context, value, context->expansion_level - 1,
-                      argv->inuse))
+  if (quotes)
+    obstack_grow (obs, quotes->str1, quotes->len1);
+  if (value != &empty_symbol
+      && m4__push_symbol (context, value, level, argv->inuse))
     arg_mark (argv);
+  if (quotes)
+    obstack_grow (obs, quotes->str2, quotes->len2);
 }
 
 /* Push series of comma-separated arguments from ARGV, which should
@@ -1368,54 +1439,47 @@ void
 m4_push_args (m4 *context, m4_obstack *obs, m4_macro_args *argv, bool skip,
              bool quote)
 {
+  m4_symbol_value tmp;
   m4_symbol_value *value;
+  m4__symbol_chain *chain;
   unsigned int i = skip ? 2 : 1;
-  const char *sep = ",";
-  size_t sep_len = 1;
-  bool use_sep = false;
-  bool inuse = false;
   const m4_string_pair *quotes = m4_get_syntax_quotes (M4SYNTAX);
-  m4_obstack *scratch = m4_arg_scratch (context);
+  char *str = NULL;
+  size_t len = obstack_object_size (obs);
 
   if (argv->argc <= i)
     return;
 
   if (argv->argc == i + 1)
     {
-      if (quote)
-       obstack_grow (obs, quotes->str1, quotes->len1);
-      m4_push_arg (context, obs, argv, i);
-      if (quote)
-       obstack_grow (obs, quotes->str2, quotes->len2);
+      m4__push_arg_quote (context, obs, argv, i, quote ? quotes : NULL);
       return;
     }
 
-  /* Compute the separator in the scratch space.  */
-  if (quote)
+  /* Since make_argv_ref puts data on obs, we must first close any
+     pending data.  The resulting symbol contents live entirely on
+     obs, so we call push_symbol with a level of -1.  */
+  if (len)
     {
-      obstack_grow (obs, quotes->str1, quotes->len1);
-      obstack_grow (scratch, quotes->str2, quotes->len2);
-      obstack_1grow (scratch, ',');
-      obstack_grow0 (scratch, quotes->str1, quotes->len1);
-      sep = (char *) obstack_finish (scratch);
-      sep_len += quotes->len1 + quotes->len2;
+      obstack_1grow (obs, '\0');
+      str = (char *) obstack_finish (obs);
     }
 
-  /* TODO push entire $@ ref, rather than each arg.  */
-  for ( ; i < argv->argc; i++)
+  /* TODO allow shift, $@, to push builtins without flatten.  */
+  value = make_argv_ref (&tmp, obs, -1, argv, i, true, quote ? quotes : NULL);
+  assert (value == &tmp);
+  if (len)
     {
-      value = m4_arg_symbol (argv, i);
-      if (use_sep)
-       obstack_grow (obs, sep, sep_len);
-      else
-       use_sep = true;
-      /* TODO handle builtin tokens?  */
-      inuse |= m4__push_symbol (context, value,
-                               context->expansion_level - 1, inuse);
+      chain = (m4__symbol_chain *) obstack_alloc (obs, sizeof *chain);
+      chain->next = value->u.u_c.chain;
+      value->u.u_c.chain = chain;
+      chain->type = M4__CHAIN_STR;
+      chain->quote_age = 0;
+      chain->u.u_s.str = str;
+      chain->u.u_s.len = len;
+      chain->u.u_s.level = SIZE_MAX;
     }
-  if (quote)
-    obstack_grow (obs, quotes->str2, quotes->len2);
-  if (inuse)
+  if (m4__push_symbol (context, value, -1, argv->inuse))
     arg_mark (argv);
 }
 
diff --git a/m4/symtab.c b/m4/symtab.c
index f7a96ee..0d2055e 100644
--- a/m4/symtab.c
+++ b/m4/symtab.c
@@ -586,7 +586,7 @@ m4_symbol_value_print (m4_symbol_value *value, m4_obstack 
*obs,
              break;
            case M4__CHAIN_ARGV:
              if (m4_arg_print (obs, chain->u.u_a.argv, chain->u.u_a.index,
-                               NULL, &len, module))
+                               chain->u.u_a.quotes, &len, module))
                result = true;
              break;
            default:
-- 
1.5.3.8

>From 6aa361e373ffb74330dd7851ecd40315784488a8 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Tue, 30 Oct 2007 20:07:32 -0600
Subject: [PATCH] Stage 14: allow pushing argv references.

* src/m4.h (struct token_chain): Add comma and quotes fields.
(arg_adjust_refcount, arg_print, push_arg_quote): New prototypes.
* src/input.c (push_token, pop_input, input_print, peek_input)
(next_char_1): Support $@ references.
* src/macro.c (struct macro_arguments): Add level field.  Match
type of arraylen to argc.
(collect_arguments): Populate new field.
(expand_macro, make_argv_ref, push_arg): Factor...
(arg_adjust_refcount, make_argv_ref_token, push_arg_quote):
...into these new methods.
(arg_token): Add new parameter.
(arg_print): New function.
(arg_mark, arg_type, arg_text, arg_equal, arg_empty, arg_len)
(arg_func, push_args): Adjust callers.
* doc/m4.texinfo (Ifelse): Augment test.

(cherry picked from commit 9d08c0c8685fdd749b20062e03c061275dc8afbc)

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog      |   26 +++++
 doc/m4.texinfo |   16 +++
 src/input.c    |   77 ++++++++++++--
 src/m4.h       |   11 ++-
 src/macro.c    |  319 +++++++++++++++++++++++++++++++++++++++-----------------
 5 files changed, 342 insertions(+), 107 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 0a53443..44e7925 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,29 @@
+2008-02-02  Eric Blake  <address@hidden>
+
+       Stage 14: allow pushing argv references.
+       Push a $@ reference to the input engine in one go, rather than
+       pushing each element.  For now, argument collection still gets one
+       argument of a $@ at a time; but the penalties of this patch make
+       it easier to manage $@ efficiently in future patches.
+       Memory impact: noticeable penalty, due to larger struct and O(n)
+       to O(n^2) on unboxed recursion
+       Speed impact: noticeable penalty, due to more bookkeeping.
+       * src/m4.h (struct token_chain): Add comma and quotes fields.
+       (arg_adjust_refcount, arg_print, push_arg_quote): New prototypes.
+       * src/input.c (push_token, pop_input, input_print, peek_input)
+       (next_char_1): Support $@ references.
+       * src/macro.c (struct macro_arguments): Add level field.  Match
+       type of arraylen to argc.
+       (collect_arguments): Populate new field.
+       (expand_macro, make_argv_ref, push_arg): Factor...
+       (arg_adjust_refcount, make_argv_ref_token, push_arg_quote):
+       ...into these new methods.
+       (arg_token): Add new parameter.
+       (arg_print): New function.
+       (arg_mark, arg_type, arg_text, arg_equal, arg_empty, arg_len)
+       (arg_func, push_args): Adjust callers.
+       * doc/m4.texinfo (Ifelse): Augment test.
+
 2008-01-31  Ralf Wildenhues  <address@hidden>
 
        * checks/Makefile.in: Use @SET_MAKE@, and use @SHELL@ rather
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index bcdb99b..c5c7c54 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -2693,6 +2693,22 @@ ifelse(-e(long)-, `-01234567890123456789', `yes', `no')
 @result{}no
 ifelse(`-01234567890123456789', -e(long)-, `yes', `no')
 @result{}no
+ifelse(`-'e(long), `-01234567890123456789', `yes', `no')
address@hidden
+ifelse(-`01234567890123456789', `-'e(long), `yes', `no')
address@hidden
+ifelse(`-'e(long), `-01234567890123456789-', `yes', `no')
address@hidden
+ifelse(`-01234567890123456789-', `-'e(long), `yes', `no')
address@hidden
+ifelse(`-'e(long)`-', `-01234567890123456789-', `yes', `no')
address@hidden
+ifelse(-`01234567890123456789-', `-'e(long)`-', `yes', `no')
address@hidden
+ifelse(`-'e(long)`-', `-01234567890123456789', `yes', `no')
address@hidden
+ifelse(`-01234567890123456789', `-'e(long)`-', `yes', `no')
address@hidden
 @end example
 
 @comment It would be nice to pass builtin tokens through ifelse, m4wrap,
diff --git a/src/input.c b/src/input.c
index 514acd1..7788562 100644
--- a/src/input.c
+++ b/src/input.c
@@ -344,7 +344,6 @@ push_token (token_data *token, int level, bool inuse)
   token_chain *chain;
 
   assert (next);
-  /* TODO - also accept TOKEN_COMP chains containing $@ ref.  */
 
   /* Speed consideration - for short enough tokens, the speed and
      memory overhead of parsing another INPUT_CHAIN link outweighs the
@@ -448,8 +447,12 @@ push_token (token_data *token, int level, bool inuse)
       else
        next->u.u_c.chain = chain;
       next->u.u_c.end = chain;
-      assert (chain->type == CHAIN_STR);
-      if (chain->u.u_s.level >= 0)
+      if (chain->type == CHAIN_ARGV)
+       {
+         assert (!chain->u.u_a.comma);
+         inuse |= arg_adjust_refcount (chain->u.u_a.argv, true);
+       }
+      else if (chain->type == CHAIN_STR && chain->u.u_s.level >= 0)
        adjust_refcount (chain->u.u_s.level, true);
       src_chain = src_chain->next;
     }
@@ -565,7 +568,10 @@ pop_input (bool cleanup)
                adjust_refcount (chain->u.u_s.level, false);
              break;
            case CHAIN_ARGV:
-             /* TODO - peek into argv.  */
+             if (chain->u.u_a.index < arg_argc (chain->u.u_a.argv))
+               return false;
+             arg_adjust_refcount (chain->u.u_a.argv, false);
+             break;
            default:
              assert (!"pop_input");
              abort ();
@@ -679,10 +685,23 @@ input_print (struct obstack *obs, const input_block 
*input)
       chain = input->u.u_c.chain;
       while (chain)
        {
-         /* TODO support argv refs as well.  */
-         assert (chain->type == CHAIN_STR);
-         if (obstack_print (obs, chain->u.u_s.str, chain->u.u_s.len, &maxlen))
-           return;
+         switch (chain->type)
+           {
+           case CHAIN_STR:
+             if (obstack_print (obs, chain->u.u_s.str, chain->u.u_s.len,
+                                &maxlen))
+               return;
+             break;
+           case CHAIN_ARGV:
+             assert (!chain->u.u_a.comma);
+             if (arg_print (obs, chain->u.u_a.argv, chain->u.u_a.index,
+                            chain->u.u_a.quotes, &maxlen))
+               return;
+             break;
+           default:
+             assert (!"input_print");
+             abort ();
+           }
          chain = chain->next;
        }
       break;
@@ -745,7 +764,21 @@ peek_input (void)
                    return to_uchar (*chain->u.u_s.str);
                  break;
                case CHAIN_ARGV:
-                 /* TODO - peek into argv.  */
+                 /* TODO - pass multiple arguments to macro.c at once.  */
+                 if (chain->u.u_a.index == arg_argc (chain->u.u_a.argv))
+                   break;
+                 if (chain->u.u_a.comma)
+                   return ',';
+                 /* Rather than directly parse argv here, we push
+                    another input block containing the next unparsed
+                    argument from argv.  */
+                 push_string_init ();
+                 push_arg_quote (current_input, chain->u.u_a.argv,
+                                 chain->u.u_a.index, chain->u.u_a.quotes);
+                 chain->u.u_a.index++;
+                 chain->u.u_a.comma = true;
+                 push_string_finish ();
+                 return peek_input ();
                default:
                  assert (!"peek_input");
                  abort ();
@@ -838,7 +871,9 @@ next_char_1 (bool allow_quote)
          chain = isp->u.u_c.chain;
          while (chain)
            {
-             if (allow_quote && chain->quote_age == current_quote_age)
+             /* TODO also support returning $@ as CHAR_QUOTE.  */
+             if (allow_quote && chain->quote_age == current_quote_age
+                 && chain->type == CHAIN_STR)
                return CHAR_QUOTE;
              switch (chain->type)
                {
@@ -854,7 +889,27 @@ next_char_1 (bool allow_quote)
                    adjust_refcount (chain->u.u_s.level, false);
                  break;
                case CHAIN_ARGV:
-                 /* TODO - read from argv.  */
+                 /* TODO - pass multiple arguments to macro.c at once.  */
+                 if (chain->u.u_a.index == arg_argc (chain->u.u_a.argv))
+                   {
+                     arg_adjust_refcount (chain->u.u_a.argv, false);
+                     break;
+                   }
+                 if (chain->u.u_a.comma)
+                   {
+                     chain->u.u_a.comma = false;
+                     return ',';
+                   }
+                 /* Rather than directly parse argv here, we push
+                    another input block containing the next unparsed
+                    argument from argv.  */
+                 push_string_init ();
+                 push_arg_quote (current_input, chain->u.u_a.argv,
+                                 chain->u.u_a.index, chain->u.u_a.quotes);
+                 chain->u.u_a.index++;
+                 chain->u.u_a.comma = true;
+                 push_string_finish ();
+                 return next_char_1 (allow_quote);
                default:
                  assert (!"next_char_1");
                  abort ();
diff --git a/src/m4.h b/src/m4.h
index ca886aa..b5430d2 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -306,7 +306,9 @@ struct token_chain
        {
          macro_arguments *argv;        /* Reference to earlier address@hidden  
*/
          unsigned int index;           /* Argument index within argv.  */
-         bool flatten;                 /* True to treat builtins as text.  */
+         bool_bitfield flatten : 1;    /* True to treat builtins as text.  */
+         bool_bitfield comma : 1;      /* True when `,' is next input.  */
+         const string_pair *quotes;    /* NULL for $*, quotes for 
address@hidden  */
        }
       u_a;
     }
@@ -476,7 +478,9 @@ extern int expansion_level;
 
 void expand_input (void);
 void call_macro (symbol *, int, macro_arguments *, struct obstack *);
+size_t adjust_refcount (int, bool);
 
+bool arg_adjust_refcount (macro_arguments *, bool);
 unsigned int arg_argc (macro_arguments *);
 token_data_type arg_type (macro_arguments *, unsigned int);
 const char *arg_text (macro_arguments *, unsigned int);
@@ -485,11 +489,14 @@ bool arg_empty (macro_arguments *, unsigned int);
 size_t arg_len (macro_arguments *, unsigned int);
 builtin_func *arg_func (macro_arguments *, unsigned int);
 struct obstack *arg_scratch (void);
+bool arg_print (struct obstack *, macro_arguments *, unsigned int,
+               const string_pair *, int *);
 macro_arguments *make_argv_ref (macro_arguments *, const char *, size_t,
                                bool, bool);
 void push_arg (struct obstack *, macro_arguments *, unsigned int);
+void push_arg_quote (struct obstack *, macro_arguments *, unsigned int,
+                    const string_pair *);
 void push_args (struct obstack *, macro_arguments *, bool, bool);
-size_t adjust_refcount (int, bool);
 
 /* Grab the text at argv index I.  Assumes macro_argument *argv is in
    scope, and aborts if the argument is not text.  */
diff --git a/src/macro.c b/src/macro.c
index 6ec09b0..d686b73 100644
--- a/src/macro.c
+++ b/src/macro.c
@@ -55,7 +55,8 @@ struct macro_arguments
      object, or 0 if quote_age changed during parsing or if any of the
      arguments might contain content that can affect rescan.  */
   unsigned int quote_age;
-  size_t arraylen; /* True length of allocated elements in array.  */
+  int level; /* Which obstack owns this argv.  */
+  unsigned int arraylen; /* True length of allocated elements in array.  */
   /* Used as a variable-length array, storing information about each
      argument.  */
   token_data *array[FLEXIBLE_ARRAY_MEMBER];
@@ -489,6 +490,7 @@ collect_arguments (symbol *sym, struct obstack *arguments,
   args.argv0 = SYMBOL_NAME (sym);
   args.argv0_len = strlen (args.argv0);
   args.quote_age = quote_age ();
+  args.level = expansion_level - 1;
   args.arraylen = 0;
   obstack_grow (argv_stack, &args, offsetof (macro_arguments, array));
 
@@ -662,31 +664,13 @@ expand_macro (symbol *sym)
   if (SYMBOL_DELETED (sym))
     free_symbol (sym);
 
-  /* If argv contains references, those refcounts must be reduced now.  */
-  if (argv->has_ref)
-    {
-      token_chain *chain;
-      size_t i;
-      for (i = 0; i < argv->arraylen; i++)
-       if (TOKEN_DATA_TYPE (argv->array[i]) == TOKEN_COMP)
-         {
-           chain = argv->array[i]->u.u_c.chain;
-           while (chain)
-             {
-               assert (chain->type == CHAIN_STR);
-               if (chain->u.u_s.level >= 0)
-                 adjust_refcount (chain->u.u_s.level, false);
-               chain = chain->next;
-             }
-         }
-    }
-
   /* We no longer need argv, so reduce the refcount.  Additionally, if
      no other references to argv were created, we can free our portion
      of the obstack, although we must leave earlier content alone.  A
      refcount of 0 implies that adjust_refcount already freed the
      entire stack.  */
-  if (adjust_refcount (level, false))
+  arg_adjust_refcount (argv, false);
+  if (stacks[level].refcount)
     {
       if (argv->inuse)
        {
@@ -733,18 +717,50 @@ adjust_refcount (int level, bool increase)
   return stacks[level].refcount;
 }
 
+/* Given ARGV, adjust the refcount of every reference it contains in
+   the direction decided by INCREASE.  Return true if increasing
+   references to ARGV implies the first use of ARGV.  */
+bool
+arg_adjust_refcount (macro_arguments *argv, bool increase)
+{
+  unsigned int i;
+  token_chain *chain;
+  bool result = !argv->inuse;
+
+  if (argv->has_ref)
+    for (i = 0; i < argv->arraylen; i++)
+      if (TOKEN_DATA_TYPE (argv->array[i]) == TOKEN_COMP)
+       {
+         chain = argv->array[i]->u.u_c.chain;
+         while (chain)
+           {
+             assert (chain->type == CHAIN_STR);
+             if (chain->u.u_s.level >= 0)
+               adjust_refcount (chain->u.u_s.level, increase);
+             chain = chain->next;
+           }
+       }
+  adjust_refcount (argv->level, increase);
+  return result;
+}
+
 
 /* Given ARGV, return the token_data that contains argument INDEX;
-   INDEX must be > 0, < argv->argc.  */
+   INDEX must be > 0, < argv->argc.  If LEVEL is non-NULL, *LEVEL is
+   set to the obstack level that contains the token (which is not
+   necessarily the level of ARGV).  */
 static token_data *
-arg_token (macro_arguments *argv, unsigned int index)
+arg_token (macro_arguments *argv, unsigned int index, int *level)
 {
   unsigned int i;
   token_data *token;
 
   assert (index && index < argv->argc);
+  if (level)
+    *level = argv->level;
   if (!argv->wrapper)
     return argv->array[index - 1];
+
   /* Must cycle through all tokens, until we find index, since a ref
      may occupy multiple indices.  */
   for (i = 0; i < argv->arraylen; i++)
@@ -758,7 +774,7 @@ arg_token (macro_arguments *argv, unsigned int index)
          if (index < chain->u.u_a.argv->argc - (chain->u.u_a.index - 1))
            {
              token = arg_token (chain->u.u_a.argv,
-                                chain->u.u_a.index - 1 + index);
+                                chain->u.u_a.index - 1 + index, level);
              if (chain->u.u_a.flatten
                  && TOKEN_DATA_TYPE (token) == TOKEN_FUNC)
                token = &empty_token;
@@ -777,6 +793,8 @@ arg_token (macro_arguments *argv, unsigned int index)
 static void
 arg_mark (macro_arguments *argv)
 {
+  if (argv->inuse)
+    return;
   argv->inuse = true;
   if (argv->wrapper)
     {
@@ -806,7 +824,7 @@ arg_type (macro_arguments *argv, unsigned int index)
 
   if (index == 0 || index >= argv->argc)
     return TOKEN_TEXT;
-  token = arg_token (argv, index);
+  token = arg_token (argv, index, NULL);
   type = TOKEN_DATA_TYPE (token);
   /* When accessed via the arg_* interface, composite tokens are
      currently sequences of text only.  */
@@ -830,7 +848,7 @@ arg_text (macro_arguments *argv, unsigned int index)
     return argv->argv0;
   if (index >= argv->argc)
     return "";
-  token = arg_token (argv, index);
+  token = arg_token (argv, index, NULL);
   switch (TOKEN_DATA_TYPE (token))
     {
     case TOKEN_TEXT:
@@ -862,8 +880,8 @@ arg_text (macro_arguments *argv, unsigned int index)
 bool
 arg_equal (macro_arguments *argv, unsigned int indexa, unsigned int indexb)
 {
-  token_data *ta = arg_token (argv, indexa);
-  token_data *tb = arg_token (argv, indexb);
+  token_data *ta = arg_token (argv, indexa, NULL);
+  token_data *tb = arg_token (argv, indexb, NULL);
   token_chain tmpa;
   token_chain tmpb;
   token_chain *ca = &tmpa;
@@ -958,7 +976,7 @@ arg_empty (macro_arguments *argv, unsigned int index)
     return argv->argv0_len == 0;
   if (index >= argv->argc)
     return true;
-  return arg_token (argv, index) == &empty_token;
+  return arg_token (argv, index, NULL) == &empty_token;
 }
 
 /* Given ARGV, return the length of argument INDEX.  Abort if the
@@ -974,7 +992,7 @@ arg_len (macro_arguments *argv, unsigned int index)
     return argv->argv0_len;
   if (index >= argv->argc)
     return 0;
-  token = arg_token (argv, index);
+  token = arg_token (argv, index, NULL);
   switch (TOKEN_DATA_TYPE (token))
     {
     case TOKEN_TEXT:
@@ -1007,7 +1025,7 @@ arg_func (macro_arguments *argv, unsigned int index)
 {
   token_data *token;
 
-  token = arg_token (argv, index);
+  token = arg_token (argv, index, NULL);
   assert (TOKEN_DATA_TYPE (token) == TOKEN_FUNC);
   return TOKEN_DATA_FUNC (token);
 }
@@ -1022,6 +1040,128 @@ arg_scratch (void)
   return stacks[expansion_level - 1].args;
 }
 
+/* Dump a representation of ARGV to the obstack OBS, starting with
+   argument INDEX.  If QUOTES is non-NULL, each argument is displayed
+   with those quotes.  If MAX_LEN is non-NULL, truncate the output
+   after *MAX_LEN bytes are output and return true; otherwise, return
+   false, and reduce *MAX_LEN by the number of bytes output.  */
+bool
+arg_print (struct obstack *obs, macro_arguments *argv, unsigned int index,
+          const string_pair *quotes, int *max_len)
+{
+  int len = max_len ? *max_len : INT_MAX;
+  unsigned int i;
+  token_data *token;
+  token_chain *chain;
+  bool comma = false;
+
+  for (i = index; i < argv->argc; i++)
+    {
+      if (comma && obstack_print (obs, ",", 1, &len))
+       return true;
+      else
+       comma = true;
+      token = arg_token (argv, i, NULL);
+      if (quotes && obstack_print (obs, quotes->str1, quotes->len1, &len))
+       return true;
+      switch (TOKEN_DATA_TYPE (token))
+       {
+       case TOKEN_TEXT:
+         if (obstack_print (obs, TOKEN_DATA_TEXT (token),
+                            TOKEN_DATA_LEN (token), &len))
+           return true;
+         break;
+       case TOKEN_COMP:
+         chain = token->u.u_c.chain;
+         while (chain)
+           {
+             switch (chain->type)
+               {
+               case CHAIN_STR:
+                 if (obstack_print (obs, chain->u.u_s.str, chain->u.u_s.len,
+                                    &len))
+                   return true;
+                 break;
+               case CHAIN_ARGV:
+                 if (arg_print (obs, chain->u.u_a.argv, chain->u.u_a.index,
+                                chain->u.u_a.quotes, &len))
+                   return true;
+                 break;
+               default:
+                 assert (!"arg_print");
+                 abort ();
+               }
+             chain = chain->next;
+           }
+         break;
+       case TOKEN_FUNC:
+         /* TODO - support func.  */
+       default:
+         assert (!"arg_print");
+         abort ();
+       }
+      if (quotes && obstack_print (obs, quotes->str2, quotes->len2,
+                                  &len))
+       return true;
+    }
+  if (max_len)
+    *max_len = len;
+  return false;
+}
+
+/* Populate the new TOKEN as a wrapper to ARGV, starting with argument
+   INDEX.  Allocate any data on OBS, owned by a given expansion LEVEL.
+   FLATTEN determines whether to allow builtins, and QUOTES determines
+   whether all arguments are quoted.  Return TOKEN when successful,
+   NULL when wrapping ARGV is trivially empty.  */
+static token_data *
+make_argv_ref_token (token_data *token, struct obstack *obs, int level,
+                    macro_arguments *argv, unsigned int index, bool flatten,
+                    const string_pair *quotes)
+{
+  token_chain *chain;
+
+  assert (obstack_object_size (obs) == 0);
+  if (argv->wrapper)
+    {
+      /* TODO for now we support only a single-length $@ chain.  */
+      assert (argv->arraylen == 1
+             && TOKEN_DATA_TYPE (argv->array[0]) == TOKEN_COMP);
+      chain = argv->array[0]->u.u_c.chain;
+      assert (!chain->next && chain->type == CHAIN_ARGV);
+      argv = chain->u.u_a.argv;
+      index += chain->u.u_a.index - 1;
+    }
+  if (index >= argv->argc)
+    return NULL;
+
+  chain = (token_chain *) obstack_alloc (obs, sizeof *chain);
+  TOKEN_DATA_TYPE (token) = TOKEN_COMP;
+  token->u.u_c.chain = token->u.u_c.end = chain;
+  chain->next = NULL;
+  chain->type = CHAIN_ARGV;
+  chain->quote_age = argv->quote_age;
+  chain->u.u_a.argv = argv;
+  chain->u.u_a.index = index;
+  chain->u.u_a.flatten = flatten;
+  chain->u.u_a.comma = false;
+  if (quotes)
+    {
+      /* Clone the quotes into the obstack, since a subsequent
+        changequote may take effect before the $@ ref is
+        rescanned.  */
+      /* TODO - optimize when quote_age is nonzero.  */
+      string_pair *tmp = (string_pair *) obstack_copy (obs, quotes,
+                                                      sizeof *quotes);
+      tmp->str1 = (char *) obstack_copy0 (obs, quotes->str1, quotes->len1);
+      tmp->str2 = (char *) obstack_copy0 (obs, quotes->str2, quotes->len2);
+      chain->u.u_a.quotes = tmp;
+    }
+  else
+    chain->u.u_a.quotes = NULL;
+  return token;
+}
+
 /* Create a new argument object using the same obstack as ARGV; thus,
    the new object will automatically be freed when the original is
    freed.  Explicitly set the macro name (argv[0]) from ARGV0 with
@@ -1035,24 +1175,16 @@ make_argv_ref (macro_arguments *argv, const char 
*argv0, size_t argv0_len,
 {
   macro_arguments *new_argv;
   token_data *token;
-  token_chain *chain;
+  token_data *new_token;
   unsigned int index = skip ? 2 : 1;
   struct obstack *obs = arg_scratch ();
 
-  /* When making a reference through a reference, point to the
-     original if possible.  */
-  if (argv->wrapper)
-    {
-      /* TODO - for now we support only a single-length $@ chain.  */
-      assert (argv->arraylen == 1
-             && TOKEN_DATA_TYPE (argv->array[0]) == TOKEN_COMP);
-      chain = argv->array[0]->u.u_c.chain;
-      assert (!chain->next && chain->type == CHAIN_ARGV);
-      argv = chain->u.u_a.argv;
-      index += chain->u.u_a.index - 1;
-    }
-  if (argv->argc <= index)
+  new_token = (token_data *) obstack_alloc (obs, sizeof *token);
+  token = make_argv_ref_token (new_token, obs, expansion_level - 1, argv,
+                              index, flatten, NULL);
+  if (!token)
     {
+      obstack_free (obs, new_token);
       new_argv = (macro_arguments *)
        obstack_alloc (obs, offsetof (macro_arguments, array));
       new_argv->arraylen = 0;
@@ -1062,28 +1194,18 @@ make_argv_ref (macro_arguments *argv, const char 
*argv0, size_t argv0_len,
   else
     {
       new_argv = (macro_arguments *)
-       obstack_alloc (obs,
-                      offsetof (macro_arguments, array) + sizeof token);
-      token = (token_data *) obstack_alloc (obs, sizeof *token);
-      chain = (token_chain *) obstack_alloc (obs, sizeof *chain);
+       obstack_alloc (obs, offsetof (macro_arguments, array) + sizeof token);
       new_argv->arraylen = 1;
       new_argv->array[0] = token;
       new_argv->wrapper = true;
-      new_argv->has_ref = true;
-      TOKEN_DATA_TYPE (token) = TOKEN_COMP;
-      token->u.u_c.chain = token->u.u_c.end = chain;
-      chain->next = NULL;
-      chain->type = CHAIN_ARGV;
-      chain->quote_age = argv->quote_age;
-      chain->u.u_a.argv = argv;
-      chain->u.u_a.index = index;
-      chain->u.u_a.flatten = flatten;
+      new_argv->has_ref = argv->has_ref;
     }
   new_argv->argc = argv->argc - (index - 1);
   new_argv->inuse = false;
   new_argv->argv0 = argv0;
   new_argv->argv0_len = argv0_len;
   new_argv->quote_age = argv->quote_age;
+  new_argv->level = argv->level;
   return new_argv;
 }
 
@@ -1092,8 +1214,6 @@ make_argv_ref (macro_arguments *argv, const char *argv0, 
size_t argv0_len,
 void
 push_arg (struct obstack *obs, macro_arguments *argv, unsigned int index)
 {
-  token_data *token;
-
   if (index == 0)
     {
       /* Always push copy of arg 0, since its lifetime is not
@@ -1103,10 +1223,27 @@ push_arg (struct obstack *obs, macro_arguments *argv, 
unsigned int index)
     }
   if (index >= argv->argc)
     return;
-  token = arg_token (argv, index);
-  /* TODO handle func tokens?  */
-  if (push_token (token, expansion_level - 1, argv->inuse))
+  push_arg_quote (obs, argv, index, NULL);
+}
+
+/* Push argument INDEX from ARGV, which must be a text token, onto the
+   expansion stack OBS for rescanning.  INDEX must be > 0, < argc.
+   QUOTES determines any quote delimiters that were in effect when the
+   reference was created.  */
+void
+push_arg_quote (struct obstack *obs, macro_arguments *argv, unsigned int index,
+               const string_pair *quotes)
+{
+  int level;
+  token_data *token = arg_token (argv, index, &level);
+
+  /* TODO handle func tokens.  */
+  if (quotes)
+    obstack_grow (obs, quotes->str1, quotes->len1);
+  if (push_token (token, level, argv->inuse))
     arg_mark (argv);
+  if (quotes)
+    obstack_grow (obs, quotes->str2, quotes->len2);
 }
 
 /* Push series of comma-separated arguments from ARGV, which should
@@ -1116,50 +1253,44 @@ push_arg (struct obstack *obs, macro_arguments *argv, 
unsigned int index)
 void
 push_args (struct obstack *obs, macro_arguments *argv, bool skip, bool quote)
 {
-  token_data *token;
   unsigned int i = skip ? 2 : 1;
-  const char *sep = ",";
-  size_t sep_len = 1;
-  bool use_sep = false;
-  bool inuse = false;
-  struct obstack *scratch = arg_scratch ();
+  token_data td;
+  token_data *token;
+  char *str = NULL;
+  size_t len = obstack_object_size (obs);
 
   if (i >= argv->argc)
     return;
 
   if (i + 1 == argv->argc)
     {
-      if (quote)
-       obstack_grow (obs, curr_quote.str1, curr_quote.len1);
-      push_arg (obs, argv, i);
-      if (quote)
-       obstack_grow (obs, curr_quote.str2, curr_quote.len2);
+      push_arg_quote (obs, argv, i, quote ? &curr_quote : NULL);
       return;
     }
 
-  /* Compute the separator in the scratch space.  */
-  if (quote)
+  /* Since make_argv_ref_token puts data on obs, we must first close
+     any pending data.  The resulting token contents live entirely on
+     obs, so we call push_token with a level of -1.  */
+  if (len)
     {
-      obstack_grow (obs, curr_quote.str1, curr_quote.len1);
-      obstack_grow (scratch, curr_quote.str2, curr_quote.len2);
-      obstack_1grow (scratch, ',');
-      obstack_grow0 (scratch, curr_quote.str1, curr_quote.len1);
-      sep = (char *) obstack_finish (scratch);
-      sep_len += curr_quote.len1 + curr_quote.len2;
+      obstack_1grow (obs, '\0');
+      str = (char *) obstack_finish (obs);
     }
-  /* TODO push entire $@ reference, rather than pushing each arg.  */
-  for ( ; i < argv->argc; i++)
+  /* TODO allow shift, $@, to push builtins without flatten.  */
+  token = make_argv_ref_token (&td, obs, -1, argv, i, true,
+                              quote ? &curr_quote : NULL);
+  assert (token);
+  if (len)
     {
-      token = arg_token (argv, i);
-      if (use_sep)
-       obstack_grow (obs, sep, sep_len);
-      else
-       use_sep = true;
-      /* TODO handle func tokens?  */
-      inuse |= push_token (token, expansion_level - 1, inuse);
+      token_chain *chain = (token_chain *) obstack_alloc (obs, sizeof *chain);
+      chain->next = token->u.u_c.chain;
+      token->u.u_c.chain = chain;
+      chain->type = CHAIN_STR;
+      chain->quote_age = 0;
+      chain->u.u_s.str = str;
+      chain->u.u_s.len = len;
+      chain->u.u_s.level = -1;
     }
-  if (quote)
-    obstack_grow (obs, curr_quote.str2, curr_quote.len2);
-  if (inuse)
+  if (push_token (token, -1, argv->inuse))
     arg_mark (argv);
 }
-- 
1.5.3.8


reply via email to

[Prev in Thread] Current Thread [Next in Thread]