m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[9/18] argv_ref speedup: use single-argument back-references as input


From: Eric Blake
Subject: [9/18] argv_ref speedup: use single-argument back-references as input
Date: Thu, 20 Dec 2007 22:37:36 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071031 Thunderbird/2.0.0.9 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Next in the series.  This patch finally starts using back-references
(stage 8 set up the hooks, but never used them; in fact, stage 8 on head
had a stale variable reference that could cause core dumps if
back-references had been used).  Here, the references only live until the
input engine rescans the text, so there is still no overall reuse of
scanned text in future macro expansion.  But since there is less copying
of data, there is a slight speedup in operation.  On the other hand, I
also added the notion of inlining short text - there is no point in
wasting 24 bytes or more in describing a reference to a short piece of
text, so I added a threshold at which references are created.  This patch
uses more memory, particularly when boxing and unboxing arguments, because
of keeping previous arguments alive, and the extra memory usage
counteracts the speedup of less copying.  Future patches in the series
reduce how much memory is kept alive, by compacting a $@ reference into a
single entity instead of the current approach of one reference per
argument in address@hidden

I did encounter one difference between the branch and head; on the branch,
$0 is shared with the symbol in the symbol table alongside its expansion
value, so its lifetime is valid throughout expand_macro but not longer.
On head, the symbol table stores symbol names independently of the stack
of expansion values, the symbol table string could be invalidated even
before expand_macro completes, thus we already had been copying $0 onto
the argument stack, which means that $0 can live as long as the rest of
the arguments.  This difference is highlighted between the two
macro.c:push_arg function implementations.

This patch also adds some test of embedded NUL behavior.  I noticed that
ever since stage 3, when the input engine tracks length instead of
NUL-terminated text, that behavior of builtins on embedded NUL is changing
(in my opinion, for the better).  Adding tests will document each change,
and ensure no regressions.

On a side note, I'm headed on vacation for two weeks.  I will have limited
email access, but will not be making any commits during that time.

2007-12-21  Eric Blake  <address@hidden>

        Stage 9: share rather than copy single-arg refs.
        * m4/gnulib-cache.m4: Import quote and memmem modules.
        * src/m4.h (arg_scratch): New prototype.
        * src/input.c (INPUT_INLINE_THRESHOLD): New define.
        (push_token): Use it to inline short text, and save references to
        longer text.
        (input_init, next_token): Simplify obstack handling.
        * src/macro.c (expand_argument): Likewise.
        (expand_macro): Track scratch space.
        (arg_scratch): New function.
        (make_argv_ref): Use it.
        (push_args): Likewise, and simplify comma handling, since most
        separators are short enough to be inlined.
        * src/builtin.c (builtin_init): Avoid cast.
        (m4_errprint, m4_index): Transparently support NUL.
        (m4_translit): Use scratch space, rather than leaking memory on
        expansion stack.
        * doc/m4.texinfo (Syntax): Add new test of embedded NUL.
        * checks/get-them: Extract test that uses external files.
        * checks/check-them: Run the new test.  Use unified diff if
        possible, and force text mode when debugging NUL handling.
        * examples/null.m4: New file.
        * examples/null.out: Likewise.
        * examples/null.err: Likewise.
        * examples/Makefile.am (EXTRA_FILES): Distribute these files.
        * .gitattributes: Treat new files as text, in spite of embedded
        NUL.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHa1Eg84KuGfSFAYARAlegAJ9T6poBv8wIOw3iAfxWONQfYTbbogCgu237
G4bfQ8AVvAtrUQT1p12GsRI=
=ByFN
-----END PGP SIGNATURE-----
From e38bf1ca50e3c5038fab27266ecb3ce3d1a0296f Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Thu, 20 Dec 2007 10:56:29 -0700
Subject: [PATCH] Stage 9: share rather than copy single-arg refs.

* ltdl/m4/gnulib-cache.m4: Import memmem and quote modules.
* m4/m4module.h (m4_arg_scratch): New prototype.
* m4/m4private.h (m4__push_symbol): Add parameter.
(m4_arg_scratch): Add fast accessor.
(struct m4): Add expansion_level member, taken...
* m4/macro.c (expansion_level): ...from here.  Adjust all users.
(expand_argument): Minor cleanup.
(expand_macro): Track scratch space per macro call.
(m4_arg_scratch): New function.
(m4_make_argv_ref): Call new function.
(m4_push_arg): Push reference to $0.
(m4_push_args): Rework separator usage, since separators will
usually be inlined.
(process_macro): Allow embedded NUL.
* m4/input.c (INPUT_INLINE_THRESHOLD): New define.
(m4__push_symbol): Add parameter.  Inline short strings, and save
references through rescanning.
* m4/symtab.c (m4_set_symbol_value_text): Weaken assertion.
* modules/m4.c (errprint, index): Handle NUL transparently.
(dumpdef, translit): Use scratch space, rather than expansion
stack.
* modules/gnu.c (renamesyms, m4symbols): Likewise.
* tests/others.at (nul character): New test.
(iso8859): Quote absolute file name, remove XFAIL.
* tests/iso8859.m4: Avoid raw NUL in output.
* tests/null.m4: New file.
* tests/null.out: Likewise.
* tests/null.err: Likewise.
* Makefile.am (OTHER_FILES): Distribute new files.
* .gitattributes: Treat new files as text.

Signed-off-by: Eric Blake <address@hidden>
---
 .gitattributes          |    2 +
 ChangeLog               |   34 +++++++++++++
 Makefile.am             |    3 +-
 doc/m4.texinfo          |    4 ++
 ltdl/m4/gnulib-cache.m4 |    4 +-
 m4/input.c              |   43 ++++++++++------
 m4/m4module.h           |    1 +
 m4/m4private.h          |    5 ++-
 m4/macro.c              |  126 +++++++++++++++++++++++++++++------------------
 m4/symtab.c             |    9 +--
 modules/gnu.c           |   24 +++------
 modules/m4.c            |   36 +++++++-------
 tests/iso8859.m4        |    6 +-
 tests/null.err          |    1 +
 tests/null.m4           |   94 +++++++++++++++++++++++++++++++++++
 tests/null.out          |   20 +++++++
 tests/others.at         |   33 ++++++++----
 17 files changed, 323 insertions(+), 122 deletions(-)
 create mode 100644 .gitattributes
 create mode 100644 tests/null.err
 create mode 100644 tests/null.m4
 create mode 100644 tests/null.out

diff --git a/.gitattributes b/.gitattributes
new file mode 100644
index 0000000..3c0e9d5
--- /dev/null
+++ b/.gitattributes
@@ -0,0 +1,2 @@
+null.* diff merge
+iso8859.m4 diff merge
diff --git a/ChangeLog b/ChangeLog
index f5cf9e1..970af15 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,37 @@
+2007-12-20  Eric Blake  <address@hidden>
+
+       Stage 9: share rather than copy single-arg refs.
+       * ltdl/m4/gnulib-cache.m4: Import memmem and quote modules.
+       * m4/m4module.h (m4_arg_scratch): New prototype.
+       * m4/m4private.h (m4__push_symbol): Add parameter.
+       (m4_arg_scratch): Add fast accessor.
+       (struct m4): Add expansion_level member, taken...
+       * m4/macro.c (expansion_level): ...from here.  Adjust all users.
+       (expand_argument): Minor cleanup.
+       (expand_macro): Track scratch space per macro call.
+       (m4_arg_scratch): New function.
+       (m4_make_argv_ref): Call new function.
+       (m4_push_arg): Push reference to $0.
+       (m4_push_args): Rework separator usage, since separators will
+       usually be inlined.
+       (process_macro): Allow embedded NUL.
+       * m4/input.c (INPUT_INLINE_THRESHOLD): New define.
+       (m4__push_symbol): Add parameter.  Inline short strings, and save
+       references through rescanning.
+       * m4/symtab.c (m4_set_symbol_value_text): Weaken assertion.
+       * modules/m4.c (errprint, index): Handle NUL transparently.
+       (dumpdef, translit): Use scratch space, rather than expansion
+       stack.
+       * modules/gnu.c (renamesyms, m4symbols): Likewise.
+       * tests/others.at (nul character): New test.
+       (iso8859): Quote absolute file name, remove XFAIL.
+       * tests/iso8859.m4: Avoid raw NUL in output.
+       * tests/null.m4: New file.
+       * tests/null.out: Likewise.
+       * tests/null.err: Likewise.
+       * Makefile.am (OTHER_FILES): Distribute new files.
+       * .gitattributes: Treat new files as text.
+
 2007-12-17  Eric Blake  <address@hidden>
 
        Stage 8: extend life of references into argv.
diff --git a/Makefile.am b/Makefile.am
index 57aba99..49db5ef 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -435,7 +435,8 @@ clean-local-tests:
        test ! -f '$(abs_srcdir)/tests/testsuite' || \
          $(SHELL) '$(abs_srcdir)/tests/testsuite' --clean
 
-OTHER_FILES    = tests/iso8859.m4 tests/stackovf.test
+OTHER_FILES    = tests/iso8859.m4 tests/stackovf.test \
+               tests/null.m4 tests/null.out tests/null.err
 
 DISTCLEANFILES += tests/atconfig tests/atlocal tests/m4
 MAINTAINERCLEANFILES += tests/generated.at '$(TESTSUITE)'
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index c8b2062..f56188f 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -1060,6 +1060,10 @@ use address@hidden characters in quoted strings 
(@pxref{Changequote}),
 comments (@pxref{Changecom}), and macro names (@pxref{Indir}), with the
 exception of the @sc{nul} character (the zero byte @samp{'\0'}).
 
address@hidden FIXME - each builtin needs to document how it handles NUL, then
address@hidden update the above paragraph to mention that NUL is now handled
address@hidden transparently.
+
 @menu
 * Names::                       Macro names
 * Quoted strings::              Quoting input to @code{m4}
diff --git a/ltdl/m4/gnulib-cache.m4 b/ltdl/m4/gnulib-cache.m4
index 1eefbd5..6069529 100644
--- a/ltdl/m4/gnulib-cache.m4
+++ b/ltdl/m4/gnulib-cache.m4
@@ -15,11 +15,11 @@
 
 
 # Specification in the form of a command-line invocation:
-#   gnulib-tool --import --dir=. --local-dir=local --lib=libgnu 
--source-base=gnu --m4-base=ltdl/m4 --doc-base=doc --aux-dir=build-aux 
--with-tests --libtool --macro-prefix=M4 assert autobuild avltree-oset 
binary-io clean-temp cloexec close-stream closein config-h configmake dirname 
error exit fdl fflush filenamecat flexmember fopen-safer free fseeko gendocs 
gettext gnupload gpl-3.0 mkstemp obstack progname regex regexprops-generic 
sprintf-posix stdbool stdlib-safer strnlen strtol tempname unlocked-io verror 
xalloc xalloc-die xprintf-posix xstrndup xvasprintf-posix
+#   gnulib-tool --import --dir=. --local-dir=local --lib=libgnu 
--source-base=gnu --m4-base=ltdl/m4 --doc-base=doc --aux-dir=build-aux 
--with-tests --libtool --macro-prefix=M4 assert autobuild avltree-oset 
binary-io clean-temp cloexec close-stream closein config-h configmake dirname 
error exit fdl fflush filenamecat flexmember fopen-safer free fseeko gendocs 
gettext gnupload gpl-3.0 memmem mkstemp obstack progname quote regex 
regexprops-generic sprintf-posix stdbool stdlib-safer strnlen strtol tempname 
unlocked-io verror xalloc xalloc-die xprintf-posix xstrndup xvasprintf-posix
 
 # Specification in the form of a few gnulib-tool.m4 macro invocations:
 gl_LOCAL_DIR([local])
-gl_MODULES([assert autobuild avltree-oset binary-io clean-temp cloexec 
close-stream closein config-h configmake dirname error exit fdl fflush 
filenamecat flexmember fopen-safer free fseeko gendocs gettext gnupload gpl-3.0 
mkstemp obstack progname regex regexprops-generic sprintf-posix stdbool 
stdlib-safer strnlen strtol tempname unlocked-io verror xalloc xalloc-die 
xprintf-posix xstrndup xvasprintf-posix])
+gl_MODULES([assert autobuild avltree-oset binary-io clean-temp cloexec 
close-stream closein config-h configmake dirname error exit fdl fflush 
filenamecat flexmember fopen-safer free fseeko gendocs gettext gnupload gpl-3.0 
memmem mkstemp obstack progname quote regex regexprops-generic sprintf-posix 
stdbool stdlib-safer strnlen strtol tempname unlocked-io verror xalloc 
xalloc-die xprintf-posix xstrndup xvasprintf-posix])
 gl_AVOID([])
 gl_SOURCE_BASE([gnu])
 gl_M4_BASE([ltdl/m4])
diff --git a/m4/input.c b/m4/input.c
index bdacfbf..26298fb 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -27,6 +27,11 @@
 /* Define this to see runtime debug info.  Implied by DEBUG.  */
 /*#define DEBUG_INPUT */
 
+/* Maximum number of bytes where it is more efficient to inline the
+   reference as a string than it is to track reference bookkeeping for
+   those bytes.  */
+#define INPUT_INLINE_THRESHOLD 16
+
 /*
    Unread input can be either files that should be read (from the
    command line or by include/sinclude), strings which should be
@@ -528,17 +533,27 @@ m4_push_string_init (m4 *context)
    everything consecutively onto the input stack.  Must be called
    between push_string_init and push_string_finish.  Return true only
    if LEVEL is less than SIZE_MAX and a reference was created to
-   VALUE.  */
+   VALUE, in which case, the lifetime of the contents of VALUE must
+   last as long as the input engine can parse references from it.  */
 bool
-m4__push_symbol (m4_symbol_value *value, size_t level)
+m4__push_symbol (m4 *context, m4_symbol_value *value, size_t level)
 {
   m4_symbol_chain *chain;
+  bool result = false;
 
   assert (next);
   /* TODO - also accept TOKEN_COMP chains.  */
   assert (m4_is_symbol_value_text (value));
-  if (m4_get_symbol_value_len (value) == 0)
-    return false;
+
+  /* Speed consideration - for short enough symbols, the speed and
+     memory overhead of parsing another INPUT_CHAIN link outweighs the
+     time to inline the symbol text.  */
+  if (m4_get_symbol_value_len (value) <= INPUT_INLINE_THRESHOLD)
+    {
+      obstack_grow (current_input, m4_get_symbol_value_text (value),
+                   m4_get_symbol_value_len (value));
+      return false;
+    }
 
   if (next->funcs == &string_funcs)
     {
@@ -553,22 +568,18 @@ m4__push_symbol (m4_symbol_value *value, size_t level)
     next->u.u_c.chain = chain;
   next->u.u_c.end = chain;
   chain->next = NULL;
-  if (level != SIZE_MAX)
-    /* TODO - use token as-is, rather than copying data.  This implies
-       lengthening lifetime of $@ arguments until the rescan is
-       complete, rather than the current approach of freeing them
-       during expand_macro.  */
-    chain->str = (char *) obstack_copy (current_input,
-                                       m4_get_symbol_value_text (value),
-                                       m4_get_symbol_value_len (value));
-  else
-    chain->str = m4_get_symbol_value_text (value);
+  chain->str = m4_get_symbol_value_text (value);
   chain->len = m4_get_symbol_value_len (value);
-  chain->level = SIZE_MAX;
+  chain->level = level;
   chain->argv = NULL;
   chain->index = 0;
   chain->flatten = false;
-  return false; /* Only return true when data is reused, not copied.  */
+  if (level < SIZE_MAX)
+    {
+      m4__adjust_refcount (context, level, true);
+      result = true;
+    }
+  return result;
 }
 
 /* Last half of m4_push_string ().  If next is now NULL, a call to
diff --git a/m4/m4module.h b/m4/m4module.h
index 520bd4e..03025af 100644
--- a/m4/m4module.h
+++ b/m4/m4module.h
@@ -310,6 +310,7 @@ extern bool m4_arg_equal            (m4_macro_args *, 
unsigned int,
 extern bool    m4_arg_empty            (m4_macro_args *, unsigned int);
 extern size_t  m4_arg_len              (m4_macro_args *, unsigned int);
 extern m4_builtin_func *m4_arg_func    (m4_macro_args *, unsigned int);
+extern m4_obstack *m4_arg_scratch      (m4 *);
 extern m4_macro_args *m4_make_argv_ref (m4 *, m4_macro_args *, const char *,
                                          size_t, bool, bool);
 extern void    m4_push_arg             (m4 *, m4_obstack *, m4_macro_args *,
diff --git a/m4/m4private.h b/m4/m4private.h
index f24942f..272069e 100644
--- a/m4/m4private.h
+++ b/m4/m4private.h
@@ -78,6 +78,7 @@ struct m4 {
   m4__search_path_info *search_path;   /* The list of path directories. */
   m4__macro_arg_stacks *arg_stacks;    /* Array of current argv refs.  */
   size_t               stacks_count;   /* Size of arg_stacks.  */
+  size_t               expansion_level;/* Macro call nesting level.  */
 };
 
 #define M4_OPT_PREFIX_BUILTINS_BIT     (1 << 0) /* -P */
@@ -450,7 +451,7 @@ typedef enum {
   M4_TOKEN_MACDEF      /* Macro's definition (see "defn"), M4_SYMBOL_FUNC.  */
 } m4__token_type;
 
-extern bool            m4__push_symbol (m4_symbol_value *, size_t);
+extern bool            m4__push_symbol (m4 *, m4_symbol_value *, size_t);
 extern m4__token_type  m4__next_token (m4 *, m4_symbol_value *, int *,
                                        const char *);
 extern bool            m4__next_token_is_open (m4 *);
@@ -459,6 +460,8 @@ extern      bool            m4__next_token_is_open (m4 *);
    that also have an identically named function exported in m4module.h.  */
 #ifdef NDEBUG
 # define m4_arg_argc(A)                (A)->argc
+# define m4_arg_scratch(C)                             \
+  ((C)->arg_stacks[(C)->expansion_level - 1].argv)
 #endif /* NDEBUG */
 
 
diff --git a/m4/macro.c b/m4/macro.c
index bb0d3b0..15ec4d9 100644
--- a/m4/macro.c
+++ b/m4/macro.c
@@ -145,9 +145,6 @@ static void    trace_header  (m4 *, size_t);
 static void    trace_flush      (m4 *);
 
 
-/* Current recursion level in expand_macro ().  */
-static size_t expansion_level = 0;
-
 /* The number of the current call of expand_macro ().  */
 static size_t macro_call_id = 0;
 
@@ -325,10 +322,9 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
                 except we don't issue warnings.  But in the future,
                 we want to allow concatenation of builtins and
                 text.  */
-             if (argp->type == M4_SYMBOL_FUNC
-                 && obstack_object_size (obs) == 0)
-               return type == M4_TOKEN_COMMA;
              len = obstack_object_size (obs);
+             if (argp->type == M4_SYMBOL_FUNC && !len)
+               return type == M4_TOKEN_COMMA;
              obstack_1grow (obs, '\0');
              VALUE_MODULE (argp) = NULL;
              m4_set_symbol_value_text (argp, obstack_finish (obs), len, age);
@@ -379,7 +375,7 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
 /* The macro expansion is handled by expand_macro ().  It parses the
    arguments, using collect_arguments (), and builds a table of pointers to
    the arguments.  The arguments themselves are stored on a local obstack.
-   Expand_macro () uses call_macro () to do the call of the macro.
+   Expand_macro () uses m4_macro_call () to do the call of the macro.
 
    Expand_macro () is potentially recursive, since it calls expand_argument
    (), which might call expand_token (), which might call expand_macro ().
@@ -391,6 +387,7 @@ static void
 expand_macro (m4 *context, const char *name, size_t len, m4_symbol *symbol)
 {
   void *args_base;             /* Base of stack->args on entry.  */
+  void *args_scratch;          /* Base of scratch space for m4_macro_call.  */
   void *argv_base;             /* Base of stack->argv on entry.  */
   m4_macro_args *argv;         /* Arguments to the called macro.  */
   m4_obstack *expansion;       /* Collects the macro's expansion.  */
@@ -399,7 +396,7 @@ expand_macro (m4 *context, const char *name, size_t len, 
m4_symbol *symbol)
   bool trace_expansion = false;        /* True if trace and debugmode(`e').  */
   size_t my_call_id;           /* Sequence id for this macro.  */
   m4_symbol_value *value;      /* Original value of this macro.  */
-  size_t level = expansion_level; /* Expansion level of this macro.  */
+  size_t level;                        /* Expansion level of this macro.  */
   m4__macro_arg_stacks *stack; /* Storage for this macro.  */
 
   /* Report errors at the location where the open parenthesis (if any)
@@ -414,6 +411,7 @@ expand_macro (m4 *context, const char *name, size_t len, 
m4_symbol *symbol)
   int loc_close_line;
 
   /* Obstack preparation.  */
+  level = context->expansion_level;
   if (context->stacks_count <= level)
     {
       size_t count = context->stacks_count;
@@ -453,7 +451,7 @@ expand_macro (m4 *context, const char *name, size_t len, 
m4_symbol *symbol)
 
   /* Prepare for macro expansion.  */
   VALUE_PENDING (value)++;
-  if (m4_get_nesting_limit_opt (context) < ++expansion_level)
+  if (m4_get_nesting_limit_opt (context) < ++context->expansion_level)
     m4_error (context, EXIT_FAILURE, 0, NULL, _("\
 recursion limit of %zu exceeded, use -L<N> to change it"),
              m4_get_nesting_limit_opt (context));
@@ -464,6 +462,11 @@ recursion limit of %zu exceeded, use -L<N> to change it"),
 
   argv = collect_arguments (context, name, len, symbol, stack->args,
                            stack->argv);
+  /* Since collect_arguments can invalidate stack by reallocating
+     context->arg_stacks during a recursive expand_macro call, we must
+     reset it here.  */
+  stack = &context->arg_stacks[level];
+  args_scratch = obstack_finish (stack->args);
 
   /* The actual macro call.  */
   loc_close_file = m4_get_current_file (context);
@@ -485,7 +488,7 @@ recursion limit of %zu exceeded, use -L<N> to change it"),
   m4_set_current_file (context, loc_close_file);
   m4_set_current_line (context, loc_close_line);
 
-  --expansion_level;
+  --context->expansion_level;
   --VALUE_PENDING (value);
   if (BIT_TEST (VALUE_FLAGS (value), VALUE_DELETED_BIT))
     m4_symbol_value_delete (value);
@@ -502,6 +505,7 @@ recursion limit of %zu exceeded, use -L<N> to change it"),
     {
       if (argv->inuse)
        {
+         obstack_free (stack->args, args_scratch);
          if (debug_macro_level & PRINT_ARGCOUNT_CHANGES)
            xfprintf (stderr, "m4debug: -%d- `%s' in use, level=%d, "
                      "refcount=%zu, argcount=%zu\n", my_call_id, argv->argv0,
@@ -622,14 +626,15 @@ static void
 process_macro (m4 *context, m4_symbol_value *value, m4_obstack *obs,
               int argc, m4_macro_args *argv)
 {
-  const char *text;
+  const char *text = m4_get_symbol_value_text (value);
+  size_t len = m4_get_symbol_value_len (value);
   int i;
 
-  for (text = m4_get_symbol_value_text (value); *text != '\0';)
+  while (len--)
     {
       char ch;
 
-      if (!m4_has_syntax (M4SYNTAX, *text, M4_SYNTAX_DOLLAR))
+      if (!m4_has_syntax (M4SYNTAX, *text, M4_SYNTAX_DOLLAR) || !len)
        {
          obstack_1grow (obs, *text);
          text++;
@@ -647,11 +652,13 @@ process_macro (m4 *context, m4_symbol_value *value, 
m4_obstack *obs,
          if (m4_get_posixly_correct_opt (context) || !isdigit(text[1]))
            {
              i = *text++ - '0';
+             len--;
            }
          else
            {
              char *endp;
              i = (int) strtol (text, &endp, 10);
+             len -= endp - text;
              text = endp;
            }
          if (i < argc)
@@ -662,12 +669,14 @@ process_macro (m4 *context, m4_symbol_value *value, 
m4_obstack *obs,
        case '#':               /* number of arguments */
          m4_shipout_int (obs, argc - 1);
          text++;
+         len--;
          break;
 
        case '*':               /* all arguments */
        case '@':               /* ... same, but quoted */
          m4_push_args (context, obs, argv, false, *text == '@');
          text++;
+         len--;
          break;
 
        default:
@@ -678,19 +687,20 @@ process_macro (m4 *context, m4_symbol_value *value, 
m4_obstack *obs,
            }
          else
            {
-             size_t len  = 0;
+             size_t len1 = 0;
              const char *endp;
              char *key;
 
              for (endp = ++text;
-                  *endp && m4_has_syntax (M4SYNTAX, *endp,
-                                          (M4_SYNTAX_OTHER | M4_SYNTAX_ALPHA
-                                           | M4_SYNTAX_NUM));
+                  len1 < len && m4_has_syntax (M4SYNTAX, *endp,
+                                               (M4_SYNTAX_OTHER
+                                                | M4_SYNTAX_ALPHA
+                                                | M4_SYNTAX_NUM));
                   ++endp)
                {
-                 ++len;
+                 ++len1;
                }
-             key = xstrndup (text, len);
+             key = xstrndup (text, len1);
 
              if (*endp)
                {
@@ -713,7 +723,8 @@ process_macro (m4 *context, m4_symbol_value *value, 
m4_obstack *obs,
                            key);
                }
 
-             text = *endp ? 1 + endp : endp;
+             len -= endp - text;
+             text = endp;
 
              free (key);
              break;
@@ -802,7 +813,7 @@ trace_header (m4 *context, size_t id)
       if (m4_is_debug_bit (context, M4_DEBUG_TRACE_LINE))
        trace_format (context, "%d:", m4_get_current_line (context));
     }
-  trace_format (context, " -%zu- ", expansion_level);
+  trace_format (context, " -%zu- ", context->expansion_level);
   if (m4_is_debug_bit (context, M4_DEBUG_TRACE_CALLID))
     trace_format (context, "id %zu: ", id);
 }
@@ -1078,9 +1089,8 @@ m4_make_argv_ref (m4 *context, m4_macro_args *argv, const 
char *argv0,
   m4_symbol_value *value;
   m4_symbol_chain *chain;
   unsigned int index = skip ? 2 : 1;
-  m4_obstack *obs = context->arg_stacks[expansion_level - 1].argv;
+  m4_obstack *obs = m4_arg_scratch (context);
 
-  assert (obstack_object_size (obs) == 0);
   /* When making a reference through a reference, point to the
      original if possible.  */
   if (argv->has_ref)
@@ -1114,7 +1124,7 @@ m4_make_argv_ref (m4 *context, m4_macro_args *argv, const 
char *argv0,
       chain->next = NULL;
       chain->str = NULL;
       chain->len = 0;
-      chain->level = expansion_level - 1;
+      chain->level = context->expansion_level - 1;
       chain->argv = argv;
       chain->index = index;
       chain->flatten = flatten;
@@ -1134,18 +1144,22 @@ m4_push_arg (m4 *context, m4_obstack *obs, 
m4_macro_args *argv,
             unsigned int index)
 {
   m4_symbol_value *value;
+  m4_symbol_value temp;
 
   if (index == 0)
     {
-      obstack_grow (obs, argv->argv0, argv->argv0_len);
-      return;
+      value = &temp;
+      m4_set_symbol_value_text (value, argv->argv0, argv->argv0_len, 0);
+    }
+  else
+    {
+      value = m4_arg_symbol (argv, index);
+      if (value == &empty_symbol)
+       return;
     }
-  value = m4_arg_symbol (argv, index);
-  if (value == &empty_symbol)
-    return;
   /* TODO handle builtin tokens?  */
   assert (value->type == M4_SYMBOL_TEXT);
-  if (m4__push_symbol (value, expansion_level - 1))
+  if (m4__push_symbol (context, value, context->expansion_level - 1))
     arg_mark (argv);
 }
 
@@ -1158,47 +1172,50 @@ m4_push_args (m4 *context, m4_obstack *obs, 
m4_macro_args *argv, bool skip,
              bool quote)
 {
   m4_symbol_value *value;
-  m4_symbol_value sep;
   unsigned int i = skip ? 2 : 1;
+  const char *sep = ",";
+  size_t sep_len = 1;
   bool use_sep = false;
   bool inuse = false;
   const char *lquote = m4_get_syntax_lquote (M4SYNTAX);
   const char *rquote = m4_get_syntax_rquote (M4SYNTAX);
+  m4_obstack *scratch = m4_arg_scratch (context);
 
   if (argv->argc <= i)
     return;
 
+  if (argv->argc == i + 1)
+    {
+      if (quote)
+       obstack_grow (obs, lquote, strlen (lquote));
+      m4_push_arg (context, obs, argv, i);
+      if (quote)
+       obstack_grow (obs, rquote, strlen (rquote));
+      return;
+    }
+
+  /* Compute the separator in the scratch space.  */
   if (quote)
     {
-      const char *str;
-      size_t len;
       obstack_grow (obs, lquote, strlen (lquote));
-      len = obstack_object_size (obs);
-      obstack_1grow (obs, '\0');
-      str = (char *) obstack_finish (obs);
-      m4_set_symbol_value_text (&sep, str, len, 0);
-      m4__push_symbol (&sep, SIZE_MAX);
-      obstack_grow (obs, rquote, strlen (rquote));
-      obstack_1grow (obs, ',');
-      obstack_grow0 (obs, lquote, strlen (lquote));
-      str = (char *) obstack_finish (obs);
-      m4_set_symbol_value_text (&sep, str,
-                               strlen (rquote) + 1 + strlen (lquote), 0);
+      obstack_grow (scratch, rquote, strlen (rquote));
+      obstack_1grow (scratch, ',');
+      obstack_grow0 (scratch, lquote, strlen (lquote));
+      sep = (char *) obstack_finish (scratch);
+      sep_len += strlen (lquote) + strlen (rquote);
     }
-  else
-    m4_set_symbol_value_text (&sep, ",", 1, 0);
 
   /* TODO push entire $@ ref, rather than each arg.  */
   for ( ; i < argv->argc; i++)
     {
       value = m4_arg_symbol (argv, i);
       if (use_sep)
-       m4__push_symbol (&sep, SIZE_MAX);
+       obstack_grow (obs, sep, sep_len);
       else
        use_sep = true;
       /* TODO handle builtin tokens?  */
       assert (value->type == M4_SYMBOL_TEXT);
-      inuse |= m4__push_symbol (value, expansion_level - 1);
+      inuse |= m4__push_symbol (context, value, context->expansion_level - 1);
     }
   if (quote)
     obstack_grow (obs, rquote, strlen (rquote));
@@ -1218,3 +1235,16 @@ m4_arg_argc (m4_macro_args *argv)
 {
   return argv->argc;
 }
+
+/* Return an obstack useful for scratch calculations, and which will
+   not interfere with macro expansion.  The obstack will be reset when
+   expand_macro completes.  */
+#undef m4_arg_scratch
+m4_obstack *
+m4_arg_scratch (m4 *context)
+{
+  m4__macro_arg_stacks *stack
+    = &context->arg_stacks[context->expansion_level - 1];
+  assert (obstack_object_size (stack->args) == 0);
+  return stack->args;
+}
diff --git a/m4/symtab.c b/m4/symtab.c
index 95ed36e..30a61ed 100644
--- a/m4/symtab.c
+++ b/m4/symtab.c
@@ -701,12 +701,9 @@ m4_set_symbol_value_text (m4_symbol_value *value, const 
char *text, size_t len,
                          unsigned int quote_age)
 {
   assert (value && text);
-  /* TODO - this assertion enforces NUL-terminated text with no
-     intermediate NULs.  Do we want to optimize memory usage and use
-     purely length-based manipulation, for one less byte per string?
-     Perhaps only without NDEBUG?  Also, do we want to support
-     embedded NUL?  */
-  assert (strlen (text) == len);
+  /* In practice, it is easier to debug when we guarantee a
+     terminating NUL, even when there are embedded NULs.  */
+  assert (!text[len]);
 
   value->type = M4_SYMBOL_TEXT;
   value->u.u_t.text = text;
diff --git a/modules/gnu.c b/modules/gnu.c
index 79ec5b2..91cfb54 100644
--- a/modules/gnu.c
+++ b/modules/gnu.c
@@ -447,8 +447,8 @@ M4BUILTIN_HANDLER (builtin)
              m4_macro_args *new_argv;
              bool flatten = (bp->flags & M4_BUILTIN_GROKS_MACRO) == 0;
              new_argv = m4_make_argv_ref (context, argv, name,
-                                           m4_arg_len (argv, 1),
-                                           true, flatten);
+                                          m4_arg_len (argv, 1),
+                                          true, flatten);
              bp->func (context, obs, argc - 1, new_argv);
            }
          free (value);
@@ -683,7 +683,7 @@ M4BUILTIN_HANDLER (indir)
          m4_macro_args *new_argv;
          bool flatten = !m4_symbol_groks_macro (symbol);
          new_argv = m4_make_argv_ref (context, argv, name,
-                                       m4_arg_len (argv, 1), true, flatten);
+                                      m4_arg_len (argv, 1), true, flatten);
          m4_macro_call (context, m4_get_symbol_value (symbol), obs,
                         argc - 1, new_argv);
        }
@@ -858,7 +858,6 @@ M4BUILTIN_HANDLER (renamesyms)
       m4_pattern_buffer *buf;  /* compiled regular expression */
 
       m4_dump_symbol_data      data;
-      m4_obstack               rename_obs;
 
       int resyntax;
 
@@ -879,25 +878,21 @@ M4BUILTIN_HANDLER (renamesyms)
       if (!buf)
        return;
 
-      obstack_init (&rename_obs);
-      data.obs = obs;
-
+      data.obs = m4_arg_scratch (context);
       m4_dump_symbols (context, &data, 1, argv, false);
 
       for (; data.size > 0; --data.size, data.base++)
        {
          const char *name = data.base[0];
 
-         if (regexp_substitute (context, &rename_obs, me, name, strlen (name),
+         if (regexp_substitute (context, data.obs, me, name, strlen (name),
                                 regexp, buf, replace, true))
            {
-             obstack_1grow (&rename_obs, '\0');
+             obstack_1grow (data.obs, '\0');
              m4_symbol_rename (M4SYMTAB, name,
-                               (char *) obstack_finish (&rename_obs));
+                               (char *) obstack_finish (data.obs));
            }
        }
-
-      obstack_free (&rename_obs, NULL);
     }
   else
     assert (!"Unable to import from m4 module");
@@ -917,10 +912,8 @@ M4BUILTIN_HANDLER (m4symbols)
   if (m4_dump_symbols)
     {
       m4_dump_symbol_data data;
-      m4_obstack data_obs;
 
-      obstack_init (&data_obs);
-      data.obs = &data_obs;
+      data.obs = m4_arg_scratch (context);
       m4_dump_symbols (context, &data, argc, argv, false);
 
       for (; data.size > 0; --data.size, data.base++)
@@ -929,7 +922,6 @@ M4BUILTIN_HANDLER (m4symbols)
          if (data.size > 1)
            obstack_1grow (obs, ',');
        }
-      obstack_free (&data_obs, NULL);
     }
   else
     assert (!"Unable to import from m4 module");
diff --git a/modules/m4.c b/modules/m4.c
index 1cad9cb..6a2d1b6 100644
--- a/modules/m4.c
+++ b/modules/m4.c
@@ -348,7 +348,7 @@ M4BUILTIN_HANDLER (dumpdef)
   size_t arg_length = m4_get_max_debug_arg_length_opt (context);
   bool module = m4_is_debug_bit (context, M4_DEBUG_TRACE_MODULE);
 
-  data.obs = obs;
+  data.obs = m4_arg_scratch (context);
   m4_dump_symbols (context, &data, argc, argv, true);
 
   for (; data.size > 0; --data.size, data.base++)
@@ -797,11 +797,15 @@ M4BUILTIN_HANDLER (mkstemp)
 /* Print all arguments on standard error.  */
 M4BUILTIN_HANDLER (errprint)
 {
+  size_t len;
+
   assert (obstack_object_size (obs) == 0);
   m4_dump_args (context, obs, 1, argv, " ", false);
-  obstack_1grow (obs, '\0');
   m4_sysval_flush (context, false);
-  fputs ((char *) obstack_finish (obs), stderr);
+  len = obstack_object_size (obs);
+  /* The close_stdin module makes it safe to skip checking the return
+     value here.  */
+  fwrite (obstack_finish (obs), 1, len, stderr);
   fflush (stderr);
 }
 
@@ -907,14 +911,10 @@ M4BUILTIN_HANDLER (index)
   const char *result = NULL;
   int retval = -1;
 
-  /* Optimize searching for the empty string (always 0) and one byte
-     (strchr tends to be more efficient than strstr).  */
-  if (!needle[0])
-    retval = 0;
-  else if (!needle[1])
-    result = strchr (haystack, *needle);
-  else
-    result = strstr (haystack, needle);
+  /* Rely on the optimizations guaranteed by gnulib's memmem
+     module.  */
+  result = (char *) memmem (haystack, m4_arg_len (argv, 1),
+                           needle, m4_arg_len (argv, 2));
   if (result)
     retval = result - haystack;
 
@@ -997,11 +997,11 @@ m4_expand_ranges (const char *s, m4_obstack *obs)
   return obstack_finish (obs);
 }
 
-/* The macro "translit" translates all characters in the first argument,
-   which are present in the second argument, into the corresponding
-   character from the third argument.  If the third argument is shorter
-   than the second, the extra characters in the second argument, are
-   deleted from the first (pueh)  */
+/* The macro "translit" translates all characters in the first
+   argument, which are present in the second argument, into the
+   corresponding character from the third argument.  If the third
+   argument is shorter than the second, the extra characters in the
+   second argument are deleted from the first.  */
 M4BUILTIN_HANDLER (translit)
 {
   const char *data;
@@ -1020,14 +1020,14 @@ M4BUILTIN_HANDLER (translit)
   from = M4ARG (2);
   if (strchr (from, '-') != NULL)
     {
-      from = m4_expand_ranges (from, obs);
+      from = m4_expand_ranges (from, m4_arg_scratch (context));
       assert (from);
     }
 
   to = M4ARG (3);
   if (strchr (to, '-') != NULL)
     {
-      to = m4_expand_ranges (to, obs);
+      to = m4_expand_ranges (to, m4_arg_scratch (context));
       assert (to);
     }
 
diff --git a/tests/iso8859.m4 b/tests/iso8859.m4
index 17b8758..903db6f 100644
--- a/tests/iso8859.m4
+++ b/tests/iso8859.m4
@@ -1,4 +1,4 @@
-dnl Copyright (C) 2006 Free Software Foundation
+dnl Copyright (C) 2006, 2007 Free Software Foundation
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it
 dnl with or without modifications, as long as this notice is preserved.
@@ -28,6 +28,6 @@ dnl
 
 !"#$%&()*+,-./0123456789:;<=>address@hidden|}~€‚ƒ„…
†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ 
¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ',
 `MATCH', `NO MATCH')
 dnl
 dnl
-# NUL does not pass through
+# NUL passes through now!
 define(`NUL_bug', `This will be seen.This will never be seen')dnl
-NUL_bug
+len(NUL_bug)
diff --git a/tests/null.err b/tests/null.err
new file mode 100644
index 0000000..61518e8
--- /dev/null
+++ b/tests/null.err
@@ -0,0 +1 @@
+errprint: -- --
diff --git a/tests/null.m4 b/tests/null.m4
new file mode 100644
index 0000000..e6e1816
--- /dev/null
+++ b/tests/null.m4
@@ -0,0 +1,94 @@
+# This file tests m4 behavior on NUL bytes
+dnl Raw pass-through:
+raw: --
+dnl Embedded in quotes:
+quoted: `--'
+dnl Embedded in comments:
+commented: #--
+dnl Passed through $1, $*, $@:
+define(`echo', address@hidden')define(`', `empty')dnl
+user: echo(--,`11')
+dnl All macros matching __*__ take no arguments, and never produce NUL
+dnl First argument of builtin: not tested yet. No builtin includes NUL, so
+dnl   this needs to warn, but warning output needs quoting.
+dnl Remaining arguments of builtin:
+`builtin:' builtin(`len', --)
+dnl changecom: not tested yet
+dnl changequote: not tested yet
+dnl changeresyntax: not tested yet. No resyntax includes NUL, needs to warn
+dnl changesyntax: not tested yet
+dnl debugfile: not tested yet. No file name includes NUL, needs to warn
+dnl debugmode: not tested yet. NUL not a valid mode, needs to warn
+dnl decr: not tested yet. NUL not a number, needs to warn
+dnl Macro name of define:
+define(`--', `odd name: $1')dnl
+dnl Definition of define: not tested yet
+dnl Macro name in defn:
+`defn:' defn(`--')
+dnl Macro contents in defn: not tested yet
+dnl divert: not tested yet. NUL not a number, needs to warn
+dnl divnum: Takes no arguments, and never produces NUL.
+dnl Discarded by dnl: --
+dnl dumpdef: not tested yet. Needs to quote properly.
+dnl Passed through errprint:
+errprint(`errprint:' --, `--
+')dnl
+dnl Passed to esyscmd: not tested yet. NUL truncates string, needs to warn
+dnl Generated from esyscmd:
+`esyscmd:' esyscmd(`printf "[\\0]"')
+dnl eval: not tested yet. NUL not a number, needs to warn
+dnl format: not tested yet. Should %s string truncation warn?
+dnl Macro name in ifdef, passed through ifdef:
+`ifdef:' ifdef(`--', `yes: --', `oops: --')dnl
+ ifdef(, `oops: --', `no: --')
+dnl Compared in ifelse, passed through ifelse:
+`ifelse:' ifelse(`-', `--', `oops', `--', --, `yes: --')
+dnl include: not tested yet. No file name includes NUL, needs to warn
+dnl incr: not tested yet. NUL not a number, needs to warn
+dnl Passed through index:
+`index:' index(`ab', `b') index(`-', `') index(`', `-')dnl
+ index(`-', `-')
+dnl Defined first argument of indir:
+`indir:' indir(`--', 11)dnl
+dnl Undefined first argument of indir: not tested yet. Needs to warn
+dnl Other arguments of indir:
+ indir(`len', `--')
+dnl Passed through len:
+`len:' len() len(--)
+dnl m4exit: not tested yet. NUL not a number, needs to warn.
+dnl m4symbols: not tested yet
+dnl Passed through m4wrap: not working yet
+m4wrap(``m4wrap:' -
+-
+')dnl
+dnl maketemp: not tested yet. No file name includes NUL, needs to warn
+dnl mkdtemp: not tested yet. No file name includes NUL, needs to warn
+dnl mkstemp: not tested yet. No file name includes NUL, needs to warn
+dnl patsubst: not tested yet
+dnl Defined argument of popdef:
+`popdef:' popdef(`--')ifdef(`--', `oops', `ok')
+dnl Undefined argument of popdef: not tested yet. Should it warn?
+dnl Macro name of pushdef:
+`pushdef:' pushdef(`--', `strange: $1')ifdef(`--', `ok', `oops')
+dnl Definition of pushdef: not tested yet
+dnl regexp: not tested yet
+dnl renamesyms: not tested yet
+dnl Passed through shift:
+`shift:' shift(`hi', `--', --)
+dnl sinclude: not tested yet. No file name includes NUL, needs to warn
+dnl First argument of substr:
+`substr:' substr(`----', `1', `3')
+dnl Other arguments of substr: not tested yet. NUL not a number, needs to warn.
+dnl syncoutput: not tested yet. No mode contains NUL, needs to warn
+dnl syscmd: not tested yet. NUL truncates string, needs to warn
+dnl sysval: Takes no arguments, and never produces NUL.
+dnl Passed to traceoff:
+traceoff(`--', `')dnl
+dnl traceon: not tested yet. Trace output needs quoting
+`traceon:' indir(`--', `--')
+dnl translit: not tested yet
+dnl Defined argument of undefine:
+`undefine:' undefine(`--')ifdef(`--', `oops', `ok')
+dnl Undefined argument of undefine: not tested yet. Should it warn?
+dnl undivert: not tested yet. No file name or number includes NUL, needs to 
warn
+dnl other modules need to be tested independently
diff --git a/tests/null.out b/tests/null.out
new file mode 100644
index 0000000..6e8a114
--- /dev/null
+++ b/tests/null.out
@@ -0,0 +1,20 @@
+# This file tests m4 behavior on NUL bytes
+raw: --
+quoted: --
+commented: #--
+user: .--.--,11.--,11.
+builtin: 3
+defn: odd name: $1
+esyscmd: []
+ifdef: yes: -- oops: --
+ifelse: yes: --
+index: 2 -1 -1 8
+indir: odd name: 11 3
+len: 1 3
+popdef: ok
+pushdef: ok
+shift: --,--
+substr: --
+traceon: strange: --
+undefine: ok
+m4wrap: -
diff --git a/tests/others.at b/tests/others.at
index 76a97eb..bb6254a 100644
--- a/tests/others.at
+++ b/tests/others.at
@@ -281,14 +281,9 @@ AT_CLEANUP
 AT_SETUP([iso8859])
 
 # Eh eh eh...
-# We can't embed iso8859.m4 in here since precisly this file demonstrates
-# an M4 ``bug'': its inability to handle the NUL character.  So there
-# is no use in trying to handle it here...  Well, until autom4te provides
-# us with means to.
-
-# However, since progress is being made to let M4 handle NUL, this test
-# is xfailed for now.
-AT_XFAIL_IF([:])
+# We can't embed iso8859.m4 in here since it includes a NUL character,
+# and we can't yet rely on autom4te being NUL-clean (even though this
+# test shows that M4 is trying to be NUL-clean).
 
 AT_DATA([[expout]],
 [[# Testing quotes
@@ -306,11 +301,27 @@ CHANGEQUOTE(«««,»»»)      # eol
 # Test use of all iso8859 characters except ^Z (win32 EOF) and NUL  ` '
 Length of string is: 254
 Comparing strings: MATCH
-# NUL does not pass through
-This will be seen.
+# NUL passes through now!
+42
 ]])
 
-AT_CHECK_M4([$abs_srcdir/iso8859.m4], 0, expout)
+AT_CHECK_M4(["$abs_srcdir/iso8859.m4"], 0, expout)
+
+AT_CLEANUP
+
+
+## ------------- ##
+## nul character ##
+## ------------- ##
+
+AT_SETUP([nul character])
+
+# We don't embed null.* in here, since it is harder to guarantee the
+# behavior of NUL through autom4te.
+cp "$abs_srcdir/null.out" expout
+cp "$abs_srcdir/null.err" experr
+
+AT_CHECK_M4(["$abs_srcdir/null.m4"], [0], [expout], [experr])
 
 AT_CLEANUP
 
-- 
1.5.3.5

From e2b843f0aca9e296495f592e39d6a8cec69c6e8a Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Fri, 26 Oct 2007 09:24:06 -0600
Subject: [PATCH] Stage 9: share rather than copy single-arg refs.

* m4/gnulib-cache.m4: Import quote and memmem modules.
* src/m4.h (arg_scratch): New prototype.
* src/input.c (INPUT_INLINE_THRESHOLD): New define.
(push_token): Use it to inline short text, and save references to
longer text.
(input_init, next_token): Simplify obstack handling.
* src/macro.c (expand_argument): Likewise.
(expand_macro): Track scratch space.
(arg_scratch): New function.
(make_argv_ref): Use it.
(push_args): Likewise, and simplify comma handling, since most
separators are short enough to be inlined.
* src/builtin.c (builtin_init): Avoid cast.
(m4_errprint, m4_index): Transparently support NUL.
(m4_translit): Use scratch space, rather than leaking memory on
expansion stack.
* doc/m4.texinfo (Syntax): Add new test of embedded NUL.
* checks/get-them: Extract test that uses external files.
* checks/check-them: Run the new test.  Use unified diff if
possible, and force text mode when debugging NUL handling.
* examples/null.m4: New file.
* examples/null.out: Likewise.
* examples/null.err: Likewise.
* examples/Makefile.am (EXTRA_FILES): Distribute these files.
* .gitattributes: Treat new files as text, in spite of embedded
NUL.

(cherry picked from commit a6c94a314afa34958330b719d66c2d4e403a94af)

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog               |   30 ++++++++++++++++
 checks/check-them       |   30 +++++++++++++---
 checks/get-them         |   20 +++++++++++
 doc/m4.texinfo          |   12 ++++++
 examples/.gitattributes |    1 +
 examples/Makefile.am    |    3 ++
 examples/null.err       |    1 +
 examples/null.m4        |   88 +++++++++++++++++++++++++++++++++++++++++++++++
 examples/null.out       |   20 +++++++++++
 m4/gnulib-cache.m4      |    4 +-
 src/builtin.c           |   37 ++++++++-----------
 src/input.c             |   47 +++++++++++++++---------
 src/m4.h                |    1 +
 src/macro.c             |   72 +++++++++++++++++++++++---------------
 14 files changed, 292 insertions(+), 74 deletions(-)
 create mode 100644 examples/.gitattributes
 create mode 100644 examples/null.err
 create mode 100644 examples/null.m4
 create mode 100644 examples/null.out

diff --git a/ChangeLog b/ChangeLog
index 2b2a0d0..e6fee41 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,33 @@
+2007-12-21  Eric Blake  <address@hidden>
+
+       Stage 9: share rather than copy single-arg refs.
+       * m4/gnulib-cache.m4: Import quote and memmem modules.
+       * src/m4.h (arg_scratch): New prototype.
+       * src/input.c (INPUT_INLINE_THRESHOLD): New define.
+       (push_token): Use it to inline short text, and save references to
+       longer text.
+       (input_init, next_token): Simplify obstack handling.
+       * src/macro.c (expand_argument): Likewise.
+       (expand_macro): Track scratch space.
+       (arg_scratch): New function.
+       (make_argv_ref): Use it.
+       (push_args): Likewise, and simplify comma handling, since most
+       separators are short enough to be inlined.
+       * src/builtin.c (builtin_init): Avoid cast.
+       (m4_errprint, m4_index): Transparently support NUL.
+       (m4_translit): Use scratch space, rather than leaking memory on
+       expansion stack.
+       * doc/m4.texinfo (Syntax): Add new test of embedded NUL.
+       * checks/get-them: Extract test that uses external files.
+       * checks/check-them: Run the new test.  Use unified diff if
+       possible, and force text mode when debugging NUL handling.
+       * examples/null.m4: New file.
+       * examples/null.out: Likewise.
+       * examples/null.err: Likewise.
+       * examples/Makefile.am (EXTRA_FILES): Distribute these files.
+       * .gitattributes: Treat new files as text, in spite of embedded
+       NUL.
+
 2007-12-18  Eric Blake  <address@hidden>
 
        Stage 8: extend life of references into argv.
diff --git a/checks/check-them b/checks/check-them
index b446f47..daa1b00 100755
--- a/checks/check-them
+++ b/checks/check-them
@@ -27,6 +27,7 @@ xerr=$tmp/m4-xerr
 failed=
 skipped=
 strip_needed=false
+diffopts=-c
 
 # Find out how the executable prints argv[0]
 m4=`m4 --help | sed -e 's/Usage: \(.*\) \[OPTION.*/\1/' \
@@ -49,6 +50,14 @@ if test "x$1" = x-I ; then
   shift; shift
 fi
 
+# Find out if diff supports useful options.
+if diff -u /dev/null /dev/null 2>/dev/null ; then
+  diffopts="-u"
+fi
+if diff -a /dev/null /dev/null 2>/dev/null ; then
+  diffopts="$diffopts -a"
+fi
+
 # Run the tests.
 for file
 do
@@ -77,9 +86,20 @@ do
       ;;
   esac
 
-  sed -e '/^dnl @result{}/!d' -e 's///' -e "s|\.\./examples|$examples|" \
-    "$file" > $xout
-  sed -e '/^dnl @error{}/!d' -e 's///' -e "s|^m4:|$m4:|" "$file" > $xerr
+  xoutfile=`sed -n 's/^dnl @ expected output: //p' "$file"`
+  if test -z "$xoutfile" ; then
+    sed -e '/^dnl @result{}/!d' -e 's///' -e "s|\.\./examples|$examples|" \
+      "$file" > $xout
+  else
+    cp "$examples/$xoutfile" $xout
+  fi
+
+  xerrfile=`sed -n 's/^dnl @ expected error: //p' "$file"`
+  if test -z "$xerrfile" ; then
+    sed -e '/^dnl @error{}/!d' -e 's///' -e "s|^m4:|$m4:|" "$file" > $xerr
+  else
+    cp "$examples/$xerrfile" $xerr
+  fi
 
   # For the benefit of mingw, normalize \r\n line endings
   if $strip_needed ; then
@@ -99,7 +119,7 @@ do
     failed="$failed $file:out"
     echo `sed -e 's/^dnl //' -e 1q $file`
     echo "$file: stdout mismatch"
-    diff $xout $out
+    diff $diffopts $xout $out
   fi
 
   if cmp -s $err $xerr; then
@@ -108,7 +128,7 @@ do
     failed="$failed $file:err"
     echo `sed -e 's/^dnl //' -e 1q $file`
     echo "$file: stderr mismatch"
-    diff $xerr $err
+    diff $diffopts $xerr $err
   fi
 
 done
diff --git a/checks/get-them b/checks/get-them
index 0d3e37b..e034962 100755
--- a/checks/get-them
+++ b/checks/get-them
@@ -16,6 +16,8 @@ BEGIN {
   file = "NONE";
   status = 0;
   options = "";
+  xout = "";
+  xerr = "";
 }
 
 /address@hidden / {
@@ -39,6 +41,8 @@ BEGIN {
   getline;
   status = 0;
   options = "";
+  xout = "";
+  xout = "";
   next;
 }
 
@@ -51,6 +55,16 @@ BEGIN {
   gsub ("@comment options:", "", options);
 }
 
+/address@hidden xout: / {
+  xout = $0;
+  gsub ("@comment xout: ", "", xout);
+}
+
+/address@hidden xerr: / {
+  xerr = $0;
+  gsub ("@comment xerr: ", "", xerr);
+}
+
 /address@hidden/, /address@hidden example$/ {
   if (seq < 0)
     next;
@@ -68,8 +82,14 @@ BEGIN {
           "dnl @ gives unlimited permission to copy and/or distribute it\n"\
           "dnl @ with or without modifications, as long as this notice\n"\
           "dnl @ is preserved.\n", FILENAME, NR, status, options) > file;
+    if (xout)
+       printf("dnl @ expected output: %s\n", xout) > file;
+    if (xerr)
+       printf("dnl @ expected error: %s\n", xerr) > file;
     status = 0;
     options = "";
+    xout = "";
+    xerr = "";
     next;
   }
   if ($0 ~ /address@hidden example$/) {
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index eaec266..bfdb6f2 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -923,6 +923,18 @@ use address@hidden characters in quoted strings 
(@pxref{Changequote}),
 comments (@pxref{Changecom}), and macro names (@pxref{Indir}), with the
 exception of the @sc{nul} character (the zero byte @samp{'\0'}).
 
address@hidden
address@hidden FIXME - each builtin needs to document how it handles NUL, then
address@hidden update the above paragraph to mention that NUL is now handled
address@hidden transparently.  Meanwhile, test that we don't regress.
+
address@hidden xout: null.out
address@hidden xerr: null.err
address@hidden
+include(`null.m4')dnl
address@hidden example
address@hidden ignore
+
 @menu
 * Names::                       Macro names
 * Quoted strings::              Quoting input to @code{m4}
diff --git a/examples/.gitattributes b/examples/.gitattributes
new file mode 100644
index 0000000..b35e74c
--- /dev/null
+++ b/examples/.gitattributes
@@ -0,0 +1 @@
+null.* diff merge
diff --git a/examples/Makefile.am b/examples/Makefile.am
index c1dc522..3450eac 100644
--- a/examples/Makefile.am
+++ b/examples/Makefile.am
@@ -45,6 +45,9 @@ indir.m4 \
 loop.m4 \
 misc.m4 \
 multiquotes.m4 \
+null.err \
+null.m4 \
+null.out \
 patsubst.m4 \
 pushpop.m4 \
 quote.m4 \
diff --git a/examples/null.err b/examples/null.err
new file mode 100644
index 0000000..61518e8
--- /dev/null
+++ b/examples/null.err
@@ -0,0 +1 @@
+errprint: -- --
diff --git a/examples/null.m4 b/examples/null.m4
new file mode 100644
index 0000000..904a6ef
--- /dev/null
+++ b/examples/null.m4
@@ -0,0 +1,88 @@
+# This file tests m4 behavior on NUL bytes
+dnl Raw pass-through:
+raw: --
+dnl Embedded in quotes:
+quoted: `--'
+dnl Embedded in comments:
+commented: #--
+dnl Passed through $1, $*, $@:
+define(`echo', address@hidden')define(`', `empty')dnl
+user: echo(--,`11')
+dnl All macros matching __*__ take no arguments, and never produce NUL
+dnl First argument of builtin: not tested yet. No builtin includes NUL, so
+dnl   this needs to warn, but warning output needs quoting.
+dnl Remaining arguments of builtin:
+`builtin:' builtin(`len', --)
+dnl changecom: not tested yet
+dnl changequote: not tested yet
+dnl changeword: not tested yet
+dnl debugfile: not tested yet. No file name includes NUL, needs to warn
+dnl debugmode: not tested yet. NUL not a valid mode, needs to warn
+dnl decr: not tested yet. NUL not a number, needs to warn
+dnl Macro name of define:
+define(`--', `odd name: $1')dnl
+dnl Definition of define: not tested yet
+dnl Macro name in defn:
+`defn:' defn(`--')
+dnl Macro contents in defn: not tested yet
+dnl divert: not tested yet. NUL not a number, needs to warn
+dnl divnum: Takes no arguments, and never produces NUL.
+dnl Discarded by dnl: --
+dnl dumpdef: not tested yet. Needs to quote properly.
+dnl Passed through errprint:
+errprint(`errprint:' --, `--
+')dnl
+dnl Passed to esyscmd: not tested yet. NUL truncates string, needs to warn
+dnl Generated from esyscmd:
+`esyscmd:' esyscmd(`printf "[\\0]"')
+dnl eval: not tested yet. NUL not a number, needs to warn
+dnl format: not tested yet. Should %s string truncation warn?
+dnl Macro name in ifdef, passed through ifdef:
+`ifdef:' ifdef(`--', `yes: --', `oops: --')dnl
+ ifdef(, `oops: --', `no: --')
+dnl Compared in ifelse, passed through ifelse:
+`ifelse:' ifelse(`-', `--', `oops', `--', --, `yes: --')
+dnl include: not tested yet. No file name includes NUL, needs to warn
+dnl incr: not tested yet. NUL not a number, needs to warn
+dnl Passed through index:
+`index:' index(`ab', `b') index(`-', `') index(`', `-')dnl
+ index(`-', `-')
+dnl Defined first argument of indir:
+`indir:' indir(`--', 11)dnl
+dnl Undefined first argument of indir: not tested yet. Needs to warn
+dnl Other arguments of indir:
+ indir(`len', `--')
+dnl Passed through len:
+`len:' len() len(--)
+dnl m4exit: not tested yet. NUL not a number, needs to warn.
+dnl Passed through m4wrap: not working yet
+m4wrap(``m4wrap:' -
+-
+')dnl
+dnl maketemp: not tested yet. No file name includes NUL, needs to warn
+dnl mkstemp: not tested yet. No file name includes NUL, needs to warn
+dnl patsubst: not tested yet
+dnl Defined argument of popdef:
+`popdef:' popdef(`--')ifdef(`--', `oops', `ok')
+dnl Undefined argument of popdef: not tested yet. Should it warn?
+dnl Macro name of pushdef:
+`pushdef:' pushdef(`--', `strange: $1')ifdef(`--', `ok', `oops')
+dnl Definition of pushdef: not tested yet
+dnl regexp: not tested yet
+dnl Passed through shift:
+`shift:' shift(`hi', `--', --)
+dnl sinclude: not tested yet. No file name includes NUL, needs to warn
+dnl First argument of substr:
+`substr:' substr(`----', `1', `3')
+dnl Other arguments of substr: not tested yet. NUL not a number, needs to warn.
+dnl syscmd: not tested yet. NUL truncates string, needs to warn
+dnl sysval: Takes no arguments, and never produces NUL.
+dnl Passed to traceoff:
+traceoff(`--', `')dnl
+dnl traceon: not tested yet. Trace output needs quoting
+`traceon:' indir(`--', `--')
+dnl translit: not tested yet
+dnl Defined argument of undefine:
+`undefine:' undefine(`--')ifdef(`--', `oops', `ok')
+dnl Undefined argument of undefine: not tested yet. Should it warn?
+dnl undivert: not tested yet. No file name or number includes NUL, needs to 
warn
diff --git a/examples/null.out b/examples/null.out
new file mode 100644
index 0000000..6e8a114
--- /dev/null
+++ b/examples/null.out
@@ -0,0 +1,20 @@
+# This file tests m4 behavior on NUL bytes
+raw: --
+quoted: --
+commented: #--
+user: .--.--,11.--,11.
+builtin: 3
+defn: odd name: $1
+esyscmd: []
+ifdef: yes: -- oops: --
+ifelse: yes: --
+index: 2 -1 -1 8
+indir: odd name: 11 3
+len: 1 3
+popdef: ok
+pushdef: ok
+shift: --,--
+substr: --
+traceon: strange: --
+undefine: ok
+m4wrap: -
diff --git a/m4/gnulib-cache.m4 b/m4/gnulib-cache.m4
index a89650c..3112f91 100644
--- a/m4/gnulib-cache.m4
+++ b/m4/gnulib-cache.m4
@@ -15,11 +15,11 @@
 
 
 # Specification in the form of a command-line invocation:
-#   gnulib-tool --import --dir=. --local-dir=local --lib=libm4 
--source-base=lib --m4-base=m4 --doc-base=doc --aux-dir=build-aux --with-tests 
--no-libtool --macro-prefix=M4 assert avltree-oset binary-io clean-temp cloexec 
close-stream closein config-h error fdl fflush flexmember fopen-safer free 
fseeko gendocs getopt gnupload gpl-3.0 mkstemp obstack regex stdbool stdint 
stdlib-safer strtol unlocked-io verror version-etc version-etc-fsf xalloc 
xprintf xvasprintf-posix
+#   gnulib-tool --import --dir=. --local-dir=local --lib=libm4 
--source-base=lib --m4-base=m4 --doc-base=doc --aux-dir=build-aux --with-tests 
--no-libtool --macro-prefix=M4 assert avltree-oset binary-io clean-temp cloexec 
close-stream closein config-h error fdl fflush flexmember fopen-safer free 
fseeko gendocs getopt gnupload gpl-3.0 memmem mkstemp obstack quote regex 
stdbool stdint stdlib-safer strtol unlocked-io verror version-etc 
version-etc-fsf xalloc xprintf xvasprintf-posix
 
 # Specification in the form of a few gnulib-tool.m4 macro invocations:
 gl_LOCAL_DIR([local])
-gl_MODULES([assert avltree-oset binary-io clean-temp cloexec close-stream 
closein config-h error fdl fflush flexmember fopen-safer free fseeko gendocs 
getopt gnupload gpl-3.0 mkstemp obstack regex stdbool stdint stdlib-safer 
strtol unlocked-io verror version-etc version-etc-fsf xalloc xprintf 
xvasprintf-posix])
+gl_MODULES([assert avltree-oset binary-io clean-temp cloexec close-stream 
closein config-h error fdl fflush flexmember fopen-safer free fseeko gendocs 
getopt gnupload gpl-3.0 memmem mkstemp obstack quote regex stdbool stdint 
stdlib-safer strtol unlocked-io verror version-etc version-etc-fsf xalloc 
xprintf xvasprintf-posix])
 gl_AVOID([])
 gl_SOURCE_BASE([lib])
 gl_M4_BASE([m4])
diff --git a/src/builtin.c b/src/builtin.c
index e8edc4b..cb5f274 100644
--- a/src/builtin.c
+++ b/src/builtin.c
@@ -463,9 +463,10 @@ builtin_init (void)
   for (bp = &builtin_tab[0]; bp->name != NULL; bp++)
     if (!no_gnu_extensions || !bp->gnu_extension)
       {
+       size_t len = strlen (bp->name);
        if (prefix_all_builtins)
          {
-           string = (char *) xmalloc (strlen (bp->name) + 4);
+           string = xcharalloc (len + 4);
            strcpy (string, "m4_");
            strcat (string, bp->name);
            define_builtin (string, bp, SYMBOL_INSERT);
@@ -1514,12 +1515,16 @@ m4_mkstemp (struct obstack *obs, int argc, 
macro_arguments *argv)
 static void
 m4_errprint (struct obstack *obs, int argc, macro_arguments *argv)
 {
+  size_t len;
+
   if (bad_argc (ARG (0), argc, 1, -1))
     return;
   dump_args (obs, 1, argv, " ", false);
-  obstack_1grow (obs, '\0');
   debug_flush_files ();
-  xfprintf (stderr, "%s", (char *) obstack_finish (obs));
+  len = obstack_object_size (obs);
+  /* The close_stdin module makes it safe to skip checking the return
+     value here.  */
+  fwrite (obstack_finish (obs), 1, len, stderr);
   fflush (stderr);
 }
 
@@ -1770,14 +1775,10 @@ m4_index (struct obstack *obs, int argc, 
macro_arguments *argv)
   haystack = ARG (1);
   needle = ARG (2);
 
-  /* Optimize searching for the empty string (always 0) and one byte
-     (strchr tends to be more efficient than strstr).  */
-  if (!needle[0])
-    retval = 0;
-  else if (!needle[1])
-    result = strchr (haystack, *needle);
-  else
-    result = strstr (haystack, needle);
+  /* Rely on the optimizations guaranteed by gnulib's memmem
+     module.  */
+  result = (char *) memmem (haystack, arg_len (argv, 1),
+                           needle, arg_len (argv, 2));
   if (result)
     retval = result - haystack;
 
@@ -1895,19 +1896,13 @@ m4_translit (struct obstack *obs, int argc, 
macro_arguments *argv)
 
   from = ARG (2);
   if (strchr (from, '-') != NULL)
-    {
-      from = expand_ranges (from, obs);
-      if (from == NULL)
-       return;
-    }
+    from = expand_ranges (from, arg_scratch ());
 
   to = ARG (3);
   if (strchr (to, '-') != NULL)
-    {
-      to = expand_ranges (to, obs);
-      if (to == NULL)
-       return;
-    }
+    to = expand_ranges (to, arg_scratch ());
+
+  assert (from && to);
 
   /* Calling strchr(from) for each character in data is quadratic,
      since both strings can be arbitrarily long.  Instead, create a
diff --git a/src/input.c b/src/input.c
index 8386061..1753be2 100644
--- a/src/input.c
+++ b/src/input.c
@@ -64,6 +64,11 @@
 # include "regex.h"
 #endif /* ENABLE_CHANGEWORD */
 
+/* Number of bytes where it is more efficient to inline the reference
+   as a string than it is to track reference bookkeeping for those
+   bytes.  */
+#define INPUT_INLINE_THRESHOLD 16
+
 /* Type of an input block.  */
 enum input_type
 {
@@ -319,18 +324,29 @@ push_string_init (void)
 | gathering input from multiple locations, rather than copying       |
 | everything consecutively onto the input stack.  Must be called     |
 | between push_string_init and push_string_finish.  Return true only |
-| if LEVEL is non-negative, and a reference was created to TOKEN.    |
+| if LEVEL is non-negative, and a reference was created to TOKEN, in |
+| which case, the lifetime of TOKEN and its contents must last as    |
+| long as the input engine can parse references to it.               |
 `-------------------------------------------------------------------*/
 bool
 push_token (token_data *token, int level)
 {
   token_chain *chain;
+  bool result = false;
 
   assert (next);
   /* TODO - also accept TOKEN_COMP chains.  */
   assert (TOKEN_DATA_TYPE (token) == TOKEN_TEXT);
-  if (TOKEN_DATA_LEN (token) == 0)
-    return false;
+
+  /* Speed consideration - for short enough tokens, the speed and
+     memory overhead of parsing another INPUT_CHAIN link outweighs the
+     time to inline the token text.  */
+  if (TOKEN_DATA_LEN (token) <= INPUT_INLINE_THRESHOLD)
+    {
+      obstack_grow (current_input, TOKEN_DATA_TEXT (token),
+                   TOKEN_DATA_LEN (token));
+      return false;
+    }
 
   if (next->type == INPUT_STRING)
     {
@@ -345,21 +361,18 @@ push_token (token_data *token, int level)
     next->u.u_c.chain = chain;
   next->u.u_c.end = chain;
   chain->next = NULL;
-  if (level >= 0)
-    /* TODO - use token as-is, rather than copying data.  This implies
-       lengthening lifetime of $@ arguments until the rescan is
-       complete, rather than the current approach of freeing them
-       during expand_macro.  */
-    chain->str = (char *) obstack_copy (current_input, TOKEN_DATA_TEXT (token),
-                                       TOKEN_DATA_LEN (token));
-  else
-    chain->str = TOKEN_DATA_TEXT (token);
+  chain->str = TOKEN_DATA_TEXT (token);
   chain->len = TOKEN_DATA_LEN (token);
-  chain->level = -1;
+  chain->level = level;
   chain->argv = NULL;
   chain->index = 0;
   chain->flatten = false;
-  return false; /* No reference exists when text is copied.  */
+  if (level >= 0)
+    {
+      adjust_refcount (level, true);
+      result = true;
+    }
+  return result;
 }
 
 /*-------------------------------------------------------------------.
@@ -897,8 +910,7 @@ input_init (void)
      will always work even if the first token parsed spills to a new
      chunk.  */
   obstack_init (&token_stack);
-  obstack_alloc (&token_stack, 1);
-  token_bottom = obstack_base (&token_stack);
+  token_bottom = obstack_finish (&token_stack);
 
   isp = NULL;
   wsp = NULL;
@@ -1230,8 +1242,7 @@ next_token (token_data *td, int *line, const char *caller)
          if (startpos != 0 ||
              regs.end [0] != obstack_object_size (&token_stack))
            {
-             *(((char *) obstack_base (&token_stack)
-                + obstack_object_size (&token_stack)) - 1) = '\0';
+             obstack_blank (&token_stack, -1);
              break;
            }
          next_char ();
diff --git a/src/m4.h b/src/m4.h
index 854f28d..ce78455 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -449,6 +449,7 @@ bool arg_equal (macro_arguments *, unsigned int, unsigned 
int);
 bool arg_empty (macro_arguments *, unsigned int);
 size_t arg_len (macro_arguments *, unsigned int);
 builtin_func *arg_func (macro_arguments *, unsigned int);
+struct obstack *arg_scratch (void);
 macro_arguments *make_argv_ref (macro_arguments *, const char *, size_t,
                                bool, bool);
 void push_arg (struct obstack *, macro_arguments *, unsigned int);
diff --git a/src/macro.c b/src/macro.c
index 63d957d..964be5b 100644
--- a/src/macro.c
+++ b/src/macro.c
@@ -29,7 +29,10 @@
 #endif /* DEBUG_MACRO */
 
 /* Opaque structure describing all arguments to a macro, including the
-   macro name at index 0.  */
+   macro name at index 0.  The lifetime of argv0 is only guaranteed
+   within a call to expand_macro, whereas the lifetime of the array
+   members is guaranteed as long as the input engine can parse text
+   with a reference to address@hidden  */
 struct macro_arguments
 {
   /* Number of arguments owned by this object, may be larger than
@@ -368,16 +371,17 @@ expand_argument (struct obstack *obs, token_data *argp, 
const char *caller)
        case TOKEN_CLOSE:
          if (paren_level == 0)
            {
+             size_t len = obstack_object_size (obs);
              if (TOKEN_DATA_TYPE (argp) == TOKEN_FUNC)
                {
-                 if (obstack_object_size (obs) == 0)
+                 if (!len)
                    return t == TOKEN_COMMA;
                  warn_builtin_concat (caller, TOKEN_DATA_FUNC (argp));
                }
-             TOKEN_DATA_TYPE (argp) = TOKEN_TEXT;
-             TOKEN_DATA_LEN (argp) = obstack_object_size (obs);
              obstack_1grow (obs, '\0');
+             TOKEN_DATA_TYPE (argp) = TOKEN_TEXT;
              TOKEN_DATA_TEXT (argp) = (char *) obstack_finish (obs);
+             TOKEN_DATA_LEN (argp) = len;
              TOKEN_DATA_QUOTE_AGE (argp) = age;
              return t == TOKEN_COMMA;
            }
@@ -533,6 +537,7 @@ static void
 expand_macro (symbol *sym)
 {
   void *args_base;             /* Base of stacks[i].args on entry.  */
+  void *args_scratch;          /* Base of scratch space for call_macro.  */
   void *argv_base;             /* Base of stacks[i].argv on entry.  */
   macro_arguments *argv;       /* Arguments to the called macro.  */
   struct obstack *expansion;   /* Collects the macro's expansion.  */
@@ -594,6 +599,7 @@ expand_macro (symbol *sym)
     trace_prepre (SYMBOL_NAME (sym), my_call_id);
 
   argv = collect_arguments (sym, stacks[level].args, stacks[level].argv);
+  args_scratch = obstack_finish (stacks[level].args);
 
   /* The actual macro call.  */
   loc_close_file = current_file;
@@ -633,6 +639,7 @@ expand_macro (symbol *sym)
     {
       if (argv->inuse)
        {
+         obstack_free (stacks[level].args, args_scratch);
          if (debug_macro_level & PRINT_ARGCOUNT_CHANGES)
            xfprintf (debug, "m4debug: -%d- `%s' in use, level=%d, "
                      "refcount=%zu, argcount=%zu\n", my_call_id, argv->argv0,
@@ -850,6 +857,16 @@ arg_func (macro_arguments *argv, unsigned int index)
   return TOKEN_DATA_FUNC (token);
 }
 
+/* Return an obstack useful for scratch calculations that will not
+   interfere with macro expansion.  The obstack will be reset when
+   expand_macro completes.  */
+struct obstack *
+arg_scratch (void)
+{
+  assert (obstack_object_size (stacks[expansion_level - 1].args) == 0);
+  return stacks[expansion_level - 1].args;
+}
+
 /* Create a new argument object using the same obstack as ARGV; thus,
    the new object will automatically be freed when the original is
    freed.  Explicitly set the macro name (argv[0]) from ARGV0 with
@@ -865,9 +882,8 @@ make_argv_ref (macro_arguments *argv, const char *argv0, 
size_t argv0_len,
   token_data *token;
   token_chain *chain;
   unsigned int index = skip ? 2 : 1;
-  struct obstack *obs = stacks[expansion_level - 1].argv;
+  struct obstack *obs = arg_scratch ();
 
-  assert (obstack_object_size (obs) == 0);
   /* When making a reference through a reference, point to the
      original if possible.  */
   if (argv->has_ref)
@@ -924,6 +940,8 @@ push_arg (struct obstack *obs, macro_arguments *argv, 
unsigned int index)
 
   if (index == 0)
     {
+      /* Always push copy of arg 0, since its lifetime is not
+        guaranteed beyond expand_macro.  */
       obstack_grow (obs, argv->argv0, argv->argv0_len);
       return;
     }
@@ -944,44 +962,42 @@ void
 push_args (struct obstack *obs, macro_arguments *argv, bool skip, bool quote)
 {
   token_data *token;
-  token_data sep;
   unsigned int i = skip ? 2 : 1;
+  const char *sep = ",";
+  size_t sep_len = 1;
   bool use_sep = false;
   bool inuse = false;
-  static char comma[2] = ",";
+  struct obstack *scratch = arg_scratch ();
 
   if (i >= argv->argc)
     return;
 
-  TOKEN_DATA_TYPE (&sep) = TOKEN_TEXT;
-  TOKEN_DATA_QUOTE_AGE (&sep) = 0;
-  if (quote)
+  if (i + 1 == argv->argc)
     {
-      char *str;
-      obstack_grow (obs, lquote.string, lquote.length);
-      TOKEN_DATA_LEN (&sep) = obstack_object_size (obs);
-      obstack_1grow (obs, '\0');
-      str = (char *) obstack_finish (obs);
-      TOKEN_DATA_TEXT (&sep) = str;
-      push_token (&sep, -1);
-      obstack_grow (obs, rquote.string, rquote.length);
-      obstack_1grow (obs, ',');
-      obstack_grow0 (obs, lquote.string, lquote.length);
-      str = (char *) obstack_finish (obs);
-      TOKEN_DATA_TEXT (&sep) = str;
-      TOKEN_DATA_LEN (&sep) = rquote.length + 1 + lquote.length;
+      if (quote)
+       obstack_grow (obs, lquote.string, lquote.length);
+      push_arg (obs, argv, i);
+      if (quote)
+       obstack_grow (obs, rquote.string, rquote.length);
+      return;
     }
-  else
+
+  /* Compute the separator in the scratch space.  */
+  if (quote)
     {
-      TOKEN_DATA_TEXT (&sep) = comma;
-      TOKEN_DATA_LEN (&sep) = 1;
+      obstack_grow (obs, lquote.string, lquote.length);
+      obstack_grow (scratch, rquote.string, rquote.length);
+      obstack_1grow (scratch, ',');
+      obstack_grow0 (scratch, lquote.string, lquote.length);
+      sep = (char *) obstack_finish (scratch);
+      sep_len += lquote.length + rquote.length;
     }
   /* TODO push entire $@ reference, rather than pushing each arg.  */
   for ( ; i < argv->argc; i++)
     {
       token = arg_token (argv, i);
       if (use_sep)
-       push_token (&sep, -1);
+       obstack_grow (obs, sep, sep_len);
       else
        use_sep = true;
       /* TODO handle func tokens?  */
-- 
1.5.3.5


reply via email to

[Prev in Thread] Current Thread [Next in Thread]