m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[3/18] argv_ref speedup: avoid length recomputations


From: Eric Blake
Subject: [3/18] argv_ref speedup: avoid length recomputations
Date: Thu, 29 Nov 2007 07:09:03 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071031 Thunderbird/2.0.0.9 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've ported the next patch in the series to head, splitting it into two
patches there since it touched so much code.

This patch adds a slight memory penalty by making the
token_data/m4_symbol_value struct larger, but gains some speed by
remembering lengths rather than repeatedly calling strlen everywhere.  It
also moves 80% of the way towards transparent support of embedded NUL,
which I intend to add after the argv_ref branch is fully merged, since the
input engine is now length based instead of NUL-termination based.  In the
process of writing this patch, I discovered that mkstemp has a slight
chance of producing a random file name that matches a defined macro, so
I'm considering a followup patch that changes the semantics of mkstemp to
produce quoted output, pending discussion with the Austin group.

2007-11-29  Eric Blake  <address@hidden>

        Stage 3: cache length, rather than computing it.
        * src/input.c (next_token): Grab length from obstack rather than
        calling strlen.
        * src/m4.h (token_data, macro_arguments): Add length field.
        (TOKEN_DATA_LEN): New accessor.
        (define_user_macro): Add parameter.
        * src/builtin.c (define_user_macro, mkstemp_helper): Use
        pre-computed length.
        (builtin_init, define_macro, m4_maketemp, m4_mkstemp): Adjust
        callers.
        (dump_args, m4_ifdef, m4_ifelse, m4_builtin, m4_indir, m4_eval)
        (m4_len, m4_substr, m4_translit, m4_regexp, m4_patsubst)
        (expand_user_macro): Use cached lengths.
        * src/freeze.c (reload_frozen_state): Adjust callers.
        * src/m4.c (main): Likewise.
        * src/macro.c (expand_token, expand_argument, collect_arguments)
        (arg_len): Use cached length.
        * doc/m4.texinfo (Mkstemp): Ensure mkstemp does not produce NUL.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHTsf+84KuGfSFAYARAvlnAKDE2HjIqFXwtM60jBbJz76IzaGJAACeOQIM
OfvOTF2CSKGC/d6Y4F+s2Pw=
=LQ+l
-----END PGP SIGNATURE-----
>From 8e1fad3c14b6308edd5f61c1f8f0d5a0e6591547 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Wed, 28 Nov 2007 06:45:20 -0700
Subject: [PATCH] Stage 3a: cache length, rather than computing it, in libm4.

* m4/m4module.h (struct m4_macro_args): Cache length.
(m4_get_symbol_len, m4_get_symbol_value_len): New accessors.
(m4_set_symbol_value_text): Change signature.
* m4/m4private.h (struct m4_symbol_value): Store string length.
(m4_get_symbol_value_text, m4_get_symbol_value_placeholder)
(m4_set_symbol_value_placeholder): Update accordingly.
(m4_set_symbol_value_text): Change signature.
(m4_get_symbol_value_len): New accessor.
* m4/input.c (struct m4_input_block, string_peek, string_read)
(string_unget, string_print, m4_push_string_finish)
(m4_push_wrapup): Track length of string input.
(m4__next_token): Adjust all users of symbol text to track length,
too.
* m4/macro.c (expand_argument, collect_arguments): Likewise.
* m4/module.c (install_macro_table): Likewise.
* modules/gnu.c (builtin, indir): Likewise.
* modules/m4.c (define, pushdef): Likewise.
* src/main.c (main): Likewise.
* src/freeze.c (reload_frozen_state): Likewise.
* m4/symtab.c (m4_symbol_value_copy): Likewise.
(m4_get_symbol_value_len): New function.
(m4_get_symbol_value_text, m4_get_symbol_value_placeholder)
(m4_set_symbol_value_text, m4_set_symbol_value_placeholder):
Adjust implementation.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog      |   28 ++++++++++++++++++++++++++++
 m4/input.c     |   40 +++++++++++++++++++++-------------------
 m4/m4module.h  |   10 +++++++---
 m4/m4private.h |   22 ++++++++++++++--------
 m4/macro.c     |    6 ++++--
 m4/module.c    |    3 ++-
 m4/symtab.c    |   39 +++++++++++++++++++++++++++++----------
 modules/gnu.c  |    5 +++--
 modules/m4.c   |    4 ++--
 src/freeze.c   |    4 +++-
 src/main.c     |    9 +++++++--
 11 files changed, 120 insertions(+), 50 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 7cdf53b..eb29c87 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,31 @@
+2007-11-28  Eric Blake  <address@hidden>
+
+       Stage 3a: cache length, rather than computing it, in libm4.
+       * m4/m4module.h (struct m4_macro_args): Cache length.
+       (m4_get_symbol_len, m4_get_symbol_value_len): New accessors.
+       (m4_set_symbol_value_text): Change signature.
+       * m4/m4private.h (struct m4_symbol_value): Store string length.
+       (m4_get_symbol_value_text, m4_get_symbol_value_placeholder)
+       (m4_set_symbol_value_placeholder): Update accordingly.
+       (m4_set_symbol_value_text): Change signature.
+       (m4_get_symbol_value_len): New accessor.
+       * m4/input.c (struct m4_input_block, string_peek, string_read)
+       (string_unget, string_print, m4_push_string_finish)
+       (m4_push_wrapup): Track length of string input.
+       (m4__next_token): Adjust all users of symbol text to track length,
+       too.
+       * m4/macro.c (expand_argument, collect_arguments): Likewise.
+       * m4/module.c (install_macro_table): Likewise.
+       * modules/gnu.c (builtin, indir): Likewise.
+       * modules/m4.c (define, pushdef): Likewise.
+       * src/main.c (main): Likewise.
+       * src/freeze.c (reload_frozen_state): Likewise.
+       * m4/symtab.c (m4_symbol_value_copy): Likewise.
+       (m4_get_symbol_value_len): New function.
+       (m4_get_symbol_value_text, m4_get_symbol_value_placeholder)
+       (m4_set_symbol_value_text, m4_set_symbol_value_placeholder):
+       Adjust implementation.
+
 2007-11-27  Eric Blake  <address@hidden>
 
        Stage 2: use accessors, not direct reference, into argv.
diff --git a/m4/input.c b/m4/input.c
index 9fbbe08..37cdcce 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -148,8 +148,8 @@ struct m4_input_block
     {
       struct
        {
-         char *start;          /* string value */
-         char *current;        /* current value */
+         char *str;            /* string value */
+         size_t len;           /* remaining length */
        }
       u_s;
       struct
@@ -449,27 +449,25 @@ static struct input_funcs string_funcs = {
 static int
 string_peek (m4_input_block *me)
 {
-  int ch = to_uchar (*me->u.u_s.current);
-
-  return (ch == '\0') ? CHAR_RETRY : ch;
+  return me->u.u_s.len ? to_uchar (*me->u.u_s.str) : CHAR_RETRY;
 }
 
 static int
 string_read (m4_input_block *me, m4 *context M4_GNUC_UNUSED,
             bool retry M4_GNUC_UNUSED)
 {
-  int ch = to_uchar (*me->u.u_s.current);
-  if (ch == '\0')
+  if (!me->u.u_s.len)
     return CHAR_RETRY;
-  me->u.u_s.current++;
-  return ch;
+  me->u.u_s.len--;
+  return to_uchar (*me->u.u_s.str++);
 }
 
 static void
 string_unget (m4_input_block *me, int ch)
 {
-  assert (me->u.u_s.current > me->u.u_s.start && ch < CHAR_EOF);
-  *--me->u.u_s.current = ch;
+  assert (ch < CHAR_EOF && to_uchar (me->u.u_s.str[-1]) == ch);
+  me->u.u_s.str--;
+  me->u.u_s.len++;
 }
 
 static void
@@ -479,13 +477,15 @@ string_print (m4_input_block *me, m4 *context, m4_obstack 
*obs)
   const char *lquote = m4_get_syntax_lquote (M4SYNTAX);
   const char *rquote = m4_get_syntax_rquote (M4SYNTAX);
   size_t arg_length = m4_get_max_debug_arg_length_opt (context);
-  const char *text = me->u.u_s.start;
-  size_t len = arg_length ? strnlen (text, arg_length) : strlen (text);
+  const char *text = me->u.u_s.str;
+  size_t len = me->u.u_s.len;
 
+  if (arg_length && arg_length < len)
+    len = arg_length;
   if (quote)
     obstack_grow (obs, lquote, strlen (lquote));
   obstack_grow (obs, text, len);
-  if (len == arg_length && text[len] != '\0')
+  if (len != me->u.u_s.len)
     obstack_grow (obs, "...", 3);
   if (quote)
     obstack_grow (obs, rquote, strlen (rquote));
@@ -529,9 +529,9 @@ m4_push_string_finish (void)
 
   if (obstack_object_size (current_input) > 0)
     {
+      next->u.u_s.len = obstack_object_size (current_input);
       obstack_1grow (current_input, '\0');
-      next->u.u_s.start = obstack_finish (current_input);
-      next->u.u_s.current = next->u.u_s.start;
+      next->u.u_s.str = obstack_finish (current_input);
       next->prev = isp;
       ret = isp = next;
       input_change = true;
@@ -665,8 +665,8 @@ m4_push_wrapup (m4 *context, const char *s)
   i->file = m4_get_current_file (context);
   i->line = m4_get_current_line (context);
 
-  i->u.u_s.start = obstack_copy0 (wrapup_stack, s, strlen (s));
-  i->u.u_s.current = i->u.u_s.start;
+  i->u.u_s.len = strlen (s);
+  i->u.u_s.str = obstack_copy0 (wrapup_stack, s, i->u.u_s.len);
 
   wsp = i;
 }
@@ -1019,6 +1019,7 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
   m4__token_type type;
   const char *file;
   int dummy;
+  size_t len;
 
   assert (next == NULL);
   if (!line)
@@ -1221,11 +1222,12 @@ m4__next_token (m4 *context, m4_symbol_value *token, 
int *line,
       }
   } while (type == M4_TOKEN_NONE);
 
+  len = obstack_object_size (&token_stack);
   obstack_1grow (&token_stack, '\0');
 
   memset (token, '\0', sizeof (m4_symbol_value));
 
-  m4_set_symbol_value_text (token, obstack_finish (&token_stack));
+  m4_set_symbol_value_text (token, obstack_finish (&token_stack), len);
   VALUE_MAX_ARGS (token)       = -1;
 
 #ifdef DEBUG_INPUT
diff --git a/m4/m4module.h b/m4/m4module.h
index b0e9405..7ffaffd 100644
--- a/m4/m4module.h
+++ b/m4/m4module.h
@@ -89,6 +89,7 @@ struct m4_macro_args
      until all references have been rescanned.  */
   bool inuse;
   const char *argv0; /* The macro name being expanded.  */
+  size_t argv0_len; /* Length of argv0.  */
   size_t arraylen; /* True length of allocated elements in array.  */
   /* Used as a variable-length array, storing information about each
      argument.  */
@@ -267,6 +268,8 @@ extern bool m4_symbol_value_groks_macro     
(m4_symbol_value *);
        (m4_is_symbol_value_placeholder (m4_get_symbol_value (symbol)))
 #define m4_get_symbol_text(symbol)                                     \
        (m4_get_symbol_value_text (m4_get_symbol_value (symbol)))
+#define m4_get_symbol_len(symbol)                                      \
+       (m4_get_symbol_value_len (m4_get_symbol_value (symbol)))
 #define m4_get_symbol_func(symbol)                                     \
        (m4_get_symbol_value_func (m4_get_symbol_value (symbol)))
 #define m4_get_symbol_builtin(symbol)                                  \
@@ -284,12 +287,13 @@ extern bool               m4_is_symbol_value_text   
(m4_symbol_value *);
 extern bool            m4_is_symbol_value_func   (m4_symbol_value *);
 extern bool            m4_is_symbol_value_placeholder  (m4_symbol_value *);
 extern bool            m4_is_symbol_value_void   (m4_symbol_value *);
-extern const char      *m4_get_symbol_value_text  (m4_symbol_value *);
+extern const char *    m4_get_symbol_value_text  (m4_symbol_value *);
+extern size_t          m4_get_symbol_value_len   (m4_symbol_value *);
 extern m4_builtin_func *m4_get_symbol_value_func  (m4_symbol_value *);
 extern const m4_builtin *m4_get_symbol_value_builtin   (m4_symbol_value *);
-extern const char      *m4_get_symbol_value_placeholder        
(m4_symbol_value *);
+extern const char *    m4_get_symbol_value_placeholder (m4_symbol_value *);
 extern void            m4_set_symbol_value_text  (m4_symbol_value *,
-                                                  const char *);
+                                                  const char *, size_t);
 extern void            m4_set_symbol_value_builtin     (m4_symbol_value *,
                                                         const m4_builtin *);
 extern void            m4_set_symbol_value_placeholder (m4_symbol_value *,
diff --git a/m4/m4private.h b/m4/m4private.h
index 10d82c9..84e7157 100644
--- a/m4/m4private.h
+++ b/m4/m4private.h
@@ -203,8 +203,13 @@ struct m4_symbol_value
   size_t               pending_expansions;
 
   m4__symbol_type      type;
-  union {
-    const char *       text;   /* Valid when type is TEXT, PLACEHOLDER.  */
+  union
+  {
+    struct
+    {
+      size_t           len;    /* Length of string.  */
+      const char *     text;   /* String contents.  */
+    } u_t;                     /* Valid when type is TEXT, PLACEHOLDER.  */
     const m4_builtin * builtin;/* Valid when type is FUNC.  */
     m4_symbol_chain *  chain;  /* Valid when type is COMP.  */
   } u;
@@ -241,20 +246,21 @@ struct m4_symbol_value
 #  define m4_is_symbol_value_void(V)   ((V)->type == M4_SYMBOL_VOID)
 #  define m4_is_symbol_value_placeholder(V)                            \
                                        ((V)->type == M4_SYMBOL_PLACEHOLDER)
-#  define m4_get_symbol_value_text(V)  ((V)->u.text)
+#  define m4_get_symbol_value_text(V)  ((V)->u.u_t.text)
+#  define m4_get_symbol_value_len(V)   ((V)->u.u_t.len)
 #  define m4_get_symbol_value_func(V)  ((V)->u.builtin->func)
 #  define m4_get_symbol_value_builtin(V) ((V)->u.builtin)
 #  define m4_get_symbol_value_placeholder(V)                           \
-                                       ((V)->u.text)
+                                       ((V)->u.u_t.text)
 #  define m4_symbol_value_groks_macro(V) (BIT_TEST ((V)->flags,                
\
                                                    VALUE_MACRO_ARGS_BIT))
 
-#  define m4_set_symbol_value_text(V, T)                               \
-       ((V)->type = M4_SYMBOL_TEXT, (V)->u.text = (T))
+#  define m4_set_symbol_value_text(V, T, L)                            \
+  ((V)->type = M4_SYMBOL_TEXT, (V)->u.u_t.text = (T), (V)->u.u_t.len = (L))
 #  define m4_set_symbol_value_builtin(V, B)                            \
-       ((V)->type = M4_SYMBOL_FUNC, (V)->u.builtin = (B))
+  ((V)->type = M4_SYMBOL_FUNC, (V)->u.builtin = (B))
 #  define m4_set_symbol_value_placeholder(V, T)                                
\
-       ((V)->type = M4_SYMBOL_PLACEHOLDER, (V)->u.text = (T))
+  ((V)->type = M4_SYMBOL_PLACEHOLDER, (V)->u.u_t.text = (T))
 #endif
 
 
diff --git a/m4/macro.c b/m4/macro.c
index 449f160..d953853 100644
--- a/m4/macro.c
+++ b/m4/macro.c
@@ -163,6 +163,7 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
   int paren_level = 0;
   const char *file = m4_get_current_file (context);
   int line = m4_get_current_line (context);
+  size_t len;
 
   argp->type = M4_SYMBOL_VOID;
 
@@ -188,9 +189,10 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
              if (argp->type == M4_SYMBOL_FUNC
                  && obstack_object_size (obs) == 0)
                return type == M4_TOKEN_COMMA;
+             len = obstack_object_size (obs);
              obstack_1grow (obs, '\0');
              VALUE_MODULE (argp) = NULL;
-             m4_set_symbol_value_text (argp, obstack_finish (obs));
+             m4_set_symbol_value_text (argp, obstack_finish (obs), len);
              return type == M4_TOKEN_COMMA;
            }
          /* fallthru */
@@ -369,7 +371,7 @@ collect_arguments (m4 *context, const char *name, m4_symbol 
*symbol,
          if (!groks_macro_args && m4_is_symbol_value_func (&token))
            {
              VALUE_MODULE (&token) = NULL;
-             m4_set_symbol_value_text (&token, "");
+             m4_set_symbol_value_text (&token, "", 0);
            }
          tokenp = (m4_symbol_value *) obstack_copy (arguments, &token,
                                                     sizeof token);
diff --git a/m4/module.c b/m4/module.c
index 4a65dbd..afeece4 100644
--- a/m4/module.c
+++ b/m4/module.c
@@ -193,8 +193,9 @@ install_macro_table (m4 *context, m4_module *module)
       for (; mp->name != NULL; mp++)
        {
          m4_symbol_value *value = m4_symbol_value_create ();
+         size_t len = strlen (mp->value);
 
-         m4_set_symbol_value_text (value, xstrdup (mp->value));
+         m4_set_symbol_value_text (value, xmemdup (mp->value, len + 1), len);
          VALUE_MODULE (value) = module;
 
          m4_symbol_pushdef (M4SYMTAB, mp->name, value);
diff --git a/m4/symtab.c b/m4/symtab.c
index 97f247b..7c253a0 100644
--- a/m4/symtab.c
+++ b/m4/symtab.c
@@ -412,7 +412,12 @@ m4_symbol_value_copy (m4_symbol_value *dest, 
m4_symbol_value *src)
   /* Caller is supposed to free text token strings, so we have to
      copy the string not just its address in that case.  */
   if (m4_is_symbol_value_text (src))
-    m4_set_symbol_value_text (dest, xstrdup (m4_get_symbol_value_text (src)));
+    {
+      size_t len = m4_get_symbol_value_len (src);
+      m4_set_symbol_value_text (dest,
+                               xmemdup (m4_get_symbol_value_text (src),
+                                        len + 1), len);
+    }
   else if (m4_is_symbol_value_placeholder (src))
     m4_set_symbol_value_placeholder (dest,
                                     xstrdup (m4_get_symbol_value_placeholder
@@ -638,7 +643,15 @@ const char *
 m4_get_symbol_value_text (m4_symbol_value *value)
 {
   assert (value && value->type == M4_SYMBOL_TEXT);
-  return value->u.text;
+  return value->u.u_t.text;
+}
+
+#undef m4_get_symbol_value_len
+size_t
+m4_get_symbol_value_len (m4_symbol_value *value)
+{
+  assert (value && value->type == M4_SYMBOL_TEXT);
+  return value->u.u_t.len;
 }
 
 #undef m4_get_symbol_value_func
@@ -662,18 +675,23 @@ const char *
 m4_get_symbol_value_placeholder (m4_symbol_value *value)
 {
   assert (value && value->type == M4_SYMBOL_PLACEHOLDER);
-  return value->u.text;
+  return value->u.u_t.text;
 }
 
 #undef m4_set_symbol_value_text
 void
-m4_set_symbol_value_text (m4_symbol_value *value, const char *text)
+m4_set_symbol_value_text (m4_symbol_value *value, const char *text, size_t len)
 {
-  assert (value);
-  assert (text);
+  assert (value && text);
+  /* TODO - this assertion requires NUL-terminated text.  Do we want
+     to optimize memory usage and use purely length-based
+     manipulation, for one less byte per string?  Perhaps only without
+     NDEBUG?  */
+  assert (strlen (text) <= len);
 
-  value->type   = M4_SYMBOL_TEXT;
-  value->u.text = text;
+  value->type = M4_SYMBOL_TEXT;
+  value->u.u_t.text = text;
+  value->u.u_t.len = len;
 }
 
 #undef m4_set_symbol_value_builtin
@@ -694,8 +712,9 @@ m4_set_symbol_value_placeholder (m4_symbol_value *value, 
const char *text)
   assert (value);
   assert (text);
 
-  value->type   = M4_SYMBOL_PLACEHOLDER;
-  value->u.text = text;
+  value->type = M4_SYMBOL_PLACEHOLDER;
+  value->u.u_t.text = text;
+  value->u.u_t.len = SIZE_MAX; /* len is not tracked for placeholders.  */
 }
 
 
diff --git a/modules/gnu.c b/modules/gnu.c
index 70c7cf6..560e0d5 100644
--- a/modules/gnu.c
+++ b/modules/gnu.c
@@ -463,7 +463,7 @@ M4BUILTIN_HANDLER (builtin)
                for (i = 2; i < argc; i++)
                  if (!m4_is_arg_text (argv, i))
                    m4_set_symbol_value_text (m4_arg_symbol (new_argv, i - 1),
-                                             "");
+                                             "", 0);
              bp->func (context, obs, argc - 1, new_argv);
              free (new_argv);
            }
@@ -707,7 +707,8 @@ M4BUILTIN_HANDLER (indir)
          if (!m4_symbol_groks_macro (symbol))
            for (i = 2; i < argc; i++)
              if (!m4_is_arg_text (argv, i))
-               m4_set_symbol_value_text (m4_arg_symbol (new_argv, i - 1), "");
+               m4_set_symbol_value_text (m4_arg_symbol (new_argv, i - 1),
+                                         "", 0);
          m4_macro_call (context, m4_get_symbol_value (symbol), obs,
                         argc - 1, new_argv);
          free (new_argv);
diff --git a/modules/m4.c b/modules/m4.c
index 37497e6..87584a2 100644
--- a/modules/m4.c
+++ b/modules/m4.c
@@ -166,7 +166,7 @@ M4BUILTIN_HANDLER (define)
       m4_symbol_value *value = m4_symbol_value_create ();
 
       if (argc == 2)
-       m4_set_symbol_value_text (value, xstrdup (""));
+       m4_set_symbol_value_text (value, xstrdup (""), 0);
       else
        m4_symbol_value_copy (value, m4_arg_symbol (argv, 2));
 
@@ -197,7 +197,7 @@ M4BUILTIN_HANDLER (pushdef)
       m4_symbol_value *value = m4_symbol_value_create ();
 
       if (argc == 2)
-       m4_set_symbol_value_text (value, xstrdup (""));
+       m4_set_symbol_value_text (value, xstrdup (""), 0);
       else
        m4_symbol_value_copy (value, m4_arg_symbol (argv, 2));
 
diff --git a/src/freeze.c b/src/freeze.c
index 6e467bb..c17f4f3 100644
--- a/src/freeze.c
+++ b/src/freeze.c
@@ -755,7 +755,9 @@ ill-formed frozen file, version 2 directive `%c' 
encountered"), 'T');
            if (number[2] > 0)
              module = m4__module_find (string[2]);
 
-           m4_set_symbol_value_text (token, xstrdup (string[1]));
+           m4_set_symbol_value_text (token, xmemdup (string[1],
+                                                     number[1] + 1),
+                                     number[1]);
            VALUE_MODULE (token) = module;
            VALUE_MAX_ARGS (token) = -1;
 
diff --git a/src/main.c b/src/main.c
index a54a533..ef9cb7b 100644
--- a/src/main.c
+++ b/src/main.c
@@ -667,11 +667,16 @@ main (int argc, char *const *argv, char *const *envp)
            /* defn->value is read-only, so we need a copy.  */
            char *macro_name = xstrdup (arg);
            char *macro_value = strchr (macro_name, '=');
+           size_t len = 0;
 
            if (macro_value != NULL)
-             *macro_value++ = '\0';
+             {
+               *macro_value++ = '\0';
+               len = strlen (macro_value);
+             }
            m4_set_symbol_value_text (value, xstrdup (macro_value
-                                                     ? macro_value : ""));
+                                                     ? macro_value : ""),
+                                     len);
 
            if (defn->code == 'D')
              m4_symbol_define (M4SYMTAB, macro_name, value);
-- 
1.5.3.5


>From 17a806c25764db643660d374bc6263a7e42d93ab Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Wed, 28 Nov 2007 14:03:48 -0700
Subject: [PATCH] Stage 3b: cache length, rather than computing it, in modules.

* m4/hash.c (m4_hash_remove): Avoid double free on remove
failure.
* m4/output.c (m4_shipout_string): Change semantics of len param.
(m4_shipout_int): Use cached length.
* m4/input.c (m4_push_string_finish): Likewise.
* modules/m4.h (m4_make_temp_func): Add parameter.
* m4/macro.c (expand_token, m4_arg_len): Use cached length.
(collect_arguments, expand_macro): Alter signature.
(trace_format): Don't use out-of-scope buffer.
(process_macro): All callers changed.
* m4/utility.c (m4_dump_args): Likewise.
* m4/symtab.c (m4_symbol_value_print): Likewise.
* modules/gnu.c (__file__, __program__, builtin, indir)
(m4symbols, mkdtemp, regexp_compile, regexp_substitute,
renamesyms, patsubst, regexp, regexp_compile): Likewise.
* modules/load.c (m4modules): Likewise.
* modules/m4.c (defn, m4wrap, maketemp, m4_make_temp)
(numb_obstack, ifdef, ifelse, divert, len, substr): Likewise.
* modules/perl.c (perleval): Likewise.
* modules/stdlib.c (getcwd, getenv, getlogin, getpwnam, getpwuid)
(hostname, uname, setenv): Likewise.
* modules/mpeval.c (numb_obstack): Likewise.
* src/freeze.c (dump_symbol_CB): Likewise.
* doc/m4.texinfo (Renamesyms, Dumpdef, Changesyntax): Adjust test.
* tests/builtins.at (mkstemp): Likewise.
* tests/others.at (iso8859): XFAIL this test, now that
length-based handling allows NUL through part but not all of M4.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog         |   31 +++++++++++++++++
 doc/m4.texinfo    |   30 ++++++++++++++---
 m4/hash.c         |   52 +++++++++++++---------------
 m4/input.c        |   10 ++++--
 m4/macro.c        |   73 ++++++++++++++++++++-------------------
 m4/output.c       |   13 ++++---
 m4/symtab.c       |   23 +++++++++----
 m4/utility.c      |    3 +-
 modules/gnu.c     |   85 +++++++++++++++++++++++-----------------------
 modules/load.c    |    3 +-
 modules/m4.c      |   97 ++++++++++++++++++++++++-----------------------------
 modules/m4.h      |    3 +-
 modules/mpeval.c  |    6 ++-
 modules/perl.c    |    2 +-
 modules/stdlib.c  |   44 ++++++++++++------------
 src/freeze.c      |    2 +-
 tests/builtins.at |   12 +++++-
 tests/others.at   |    4 ++
 18 files changed, 282 insertions(+), 211 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index eb29c87..695720d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,34 @@
+2007-11-29  Eric Blake  <address@hidden>
+
+       Stage 3b: cache length, rather than computing it, in modules.
+       * m4/hash.c (m4_hash_remove): Avoid double free on remove
+       failure.
+       * m4/output.c (m4_shipout_string): Change semantics of len param.
+       (m4_shipout_int): Use cached length.
+       * m4/input.c (m4_push_string_finish): Likewise.
+       * modules/m4.h (m4_make_temp_func): Add parameter.
+       * m4/macro.c (expand_token, m4_arg_len): Use cached length.
+       (collect_arguments, expand_macro): Alter signature.
+       (trace_format): Don't use out-of-scope buffer.
+       (process_macro): All callers changed.
+       * m4/utility.c (m4_dump_args): Likewise.
+       * m4/symtab.c (m4_symbol_value_print): Likewise.
+       * modules/gnu.c (__file__, __program__, builtin, indir)
+       (m4symbols, mkdtemp, regexp_compile, regexp_substitute,
+       renamesyms, patsubst, regexp, regexp_compile): Likewise.
+       * modules/load.c (m4modules): Likewise.
+       * modules/m4.c (defn, m4wrap, maketemp, m4_make_temp)
+       (numb_obstack, ifdef, ifelse, divert, len, substr): Likewise.
+       * modules/perl.c (perleval): Likewise.
+       * modules/stdlib.c (getcwd, getenv, getlogin, getpwnam, getpwuid)
+       (hostname, uname, setenv): Likewise.
+       * modules/mpeval.c (numb_obstack): Likewise.
+       * src/freeze.c (dump_symbol_CB): Likewise.
+       * doc/m4.texinfo (Renamesyms, Dumpdef, Changesyntax): Adjust test.
+       * tests/builtins.at (mkstemp): Likewise.
+       * tests/others.at (iso8859): XFAIL this test, now that
+       length-based handling allows NUL through part but not all of M4.
+
 2007-11-28  Eric Blake  <address@hidden>
 
        Stage 3a: cache length, rather than computing it, in libm4.
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 0dc3d48..f298973 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -2451,15 +2451,28 @@ The macro @code{renamesyms} is recognized only with 
parameters.
 This macro was added in M4 2.0.
 @end deffn
 
-Here is an example that performs the same renaming as the
+Here is an example that starts by performing a similar renaming to the
 @option{--prefix-builtins} option (or @option{-P}).  Where
 @option{--prefix-builtins} only renames M4 builtin macros,
 @code{renamesyms} will rename any macros that match when it runs,
-including text macros.
+including text macros.  The rest of the example demonstrates the
+behavior of unanchored regular expressions in symbol renaming.
 
 @example
+define(`foo', `bar')
address@hidden
 renamesyms(`^.*$', `m4_\&')
 @result{}
+foo
address@hidden
+m4_foo
address@hidden
+m4_defn(`m4_foo')
address@hidden
+m4_renamesyms(`f', `g')
address@hidden
+m4_igdeg(`m4_goo', `m4_goo')
address@hidden
 @end example
 
 If @var{resyntax} is given, @var{regexp} must be given according to
@@ -2474,6 +2487,11 @@ renamesyms(`^[^_]\w*$', `m4_\&')
 @result{}
 m4_renamesyms(`^m4_m4(\w*)$', `m4_\1', `POSIX_EXTENDED')
 @result{}
+m4_wrap(__line__
+)
address@hidden
+^D
address@hidden
 @end example
 
 When a symbol has multiple definitions, thanks to @code{pushdef}, the
@@ -3413,7 +3431,7 @@ f(popdef(`f')dumpdef(`f'))
 @samp{q} flag is implied when the @option{--debug} option (@option{-d},
 @pxref{Debugging options, , Invoking m4}) is used in the command line
 without arguments. Also, @option{--debuglen} (@pxref{Debuglen}) can affect
-output, by truncating longer strings.
+output, by truncating longer strings (but not builtin and module names).
 
 @comment options: -ds -l3
 @example
@@ -3429,8 +3447,8 @@ debugmode(`+m')
 dumpdef(`foo', `dnl', `indir', `__gnu__')
 @error{}__gnu__:@address@hidden@}
 @error{}dnl:@tabchar{}<dnl>@address@hidden
address@hidden:@tabchar{}3, <div...>@address@hidden, 1 l...
address@hidden:@tabchar{}<ind...>@address@hidden
address@hidden:@tabchar{}3, <divnum>@address@hidden, 1 l...
address@hidden:@tabchar{}<indir>@address@hidden
 @result{}
 debugmode(`-m')
 @result{}
@@ -4793,6 +4811,8 @@ foo
 @result{}foo
 @@foo
 @result{}bar
+@@bar
address@hidden@@bar
 @@changesyntax(`@@\', `O@@')
 @result{}
 foo
diff --git a/m4/hash.c b/m4/hash.c
index 4297fe4..c47a7e4 100644
--- a/m4/hash.c
+++ b/m4/hash.c
@@ -284,13 +284,14 @@ node_insert (m4_hash *hash, hash_node *node)
 /* Remove from HASH, the first node with key KEY; comparing keys with
    HASH's cmp_func.  Any nodes with the same KEY previously hidden by
    the removed node will become visible again.  The key field of the
-   removed node is returned, or the original KEY If there was no
-   match.  This is unsafe if multiple iterators are visiting HASH, or
-   when a lone iterator is visiting on a different key.  */
+   removed node is returned, or NULL if there was no match.  This is
+   unsafe if multiple iterators are visiting HASH, or when a lone
+   iterator is visiting on a different key.  */
 void *
 m4_hash_remove (m4_hash *hash, const void *key)
 {
   size_t n;
+  hash_node *node = NULL;
 
 #ifndef NDEBUG
   m4_hash_iterator *iter = HASH_ITER (hash);
@@ -304,36 +305,31 @@ m4_hash_remove (m4_hash *hash, const void *key)
 #endif
 
   n = BUCKET_COUNT (hash, key);
+  do
+    {
+      hash_node *next = node ? NODE_NEXT (node) : BUCKET_NTH (hash, n);
 
-  {
-    hash_node *node = NULL;
-
-    do
-      {
-       hash_node *next = node ? NODE_NEXT (node) : BUCKET_NTH (hash, n);
-
-       if (next && ((*HASH_CMP_FUNC (hash)) (NODE_KEY (next), key) == 0))
-         {
-           if (node)
-             NODE_NEXT (node)      = NODE_NEXT (next);
-           else
-             BUCKET_NTH (hash, n)  = NODE_NEXT (next);
+      if (next && ((*HASH_CMP_FUNC (hash)) (NODE_KEY (next), key) == 0))
+       {
+         if (node)
+           NODE_NEXT (node) = NODE_NEXT (next);
+         else
+           BUCKET_NTH (hash, n) = NODE_NEXT (next);
 
-           key = NODE_KEY (next);
+         key = NODE_KEY (next);
 #ifndef NDEBUG
-           if (iter)
-             assert (ITERATOR_PLACE (iter) == next);
-           NODE_KEY (next) = NULL;
+         if (iter)
+           assert (ITERATOR_PLACE (iter) == next);
+         NODE_KEY (next) = NULL;
 #endif
-           node_delete (hash, next);
-           break;
-         }
-       node = next;
-      }
-    while (node);
-  }
+         node_delete (hash, next);
+         return (void *) key; /* Cast away const.  */
+       }
+      node = next;
+    }
+  while (node);
 
-  return (void *) key;
+  return NULL;
 }
 
 /* Return the address of the value field of the first node in HASH
diff --git a/m4/input.c b/m4/input.c
index 37cdcce..08c5f64 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -523,13 +523,17 @@ m4_input_block *
 m4_push_string_finish (void)
 {
   m4_input_block *ret = NULL;
+  size_t len = obstack_object_size (current_input);
 
   if (next == NULL)
-    return isp;
+    {
+      assert (!len);
+      return isp;
+    }
 
-  if (obstack_object_size (current_input) > 0)
+  if (len)
     {
-      next->u.u_s.len = obstack_object_size (current_input);
+      next->u.u_s.len = len;
       obstack_1grow (current_input, '\0');
       next->u.u_s.str = obstack_finish (current_input);
       next->prev = isp;
diff --git a/m4/macro.c b/m4/macro.c
index d953853..5769f99 100644
--- a/m4/macro.c
+++ b/m4/macro.c
@@ -29,10 +29,10 @@
 
 #include "intprops.h"
 
-static m4_macro_args *collect_arguments (m4 *, const char *, m4_symbol *,
-                                        m4_obstack *, unsigned int,
-                                        m4_obstack *);
-static void    expand_macro      (m4 *, const char *, m4_symbol *);
+static m4_macro_args *collect_arguments (m4 *, const char *, size_t,
+                                        m4_symbol *, m4_obstack *,
+                                        unsigned int, m4_obstack *);
+static void    expand_macro      (m4 *, const char *, size_t, m4_symbol *);
 static void    expand_token      (m4 *, m4_obstack *, m4__token_type,
                                  m4_symbol_value *, int);
 static bool    expand_argument   (m4 *, m4_obstack *, m4_symbol_value *,
@@ -115,15 +115,21 @@ expand_token (m4 *context, m4_obstack *obs,
     case M4_TOKEN_SIMPLE:
     case M4_TOKEN_STRING:
     case M4_TOKEN_SPACE:
-      m4_shipout_text (context, obs, text, strlen (text), line);
+      m4_shipout_text (context, obs, text, m4_get_symbol_value_len (token),
+                      line);
       break;
 
     case M4_TOKEN_WORD:
       {
        const char *textp = text;
+       size_t len = m4_get_symbol_value_len (token);
+       size_t len2 = len;
 
        if (m4_has_syntax (M4SYNTAX, to_uchar (*textp), M4_SYNTAX_ESCAPE))
-         ++textp;
+         {
+           textp++;
+           len2--;
+         }
 
        symbol = m4_symbol_lookup (M4SYMTAB, textp);
        assert (!symbol || !m4_is_symbol_void (symbol));
@@ -131,11 +137,9 @@ expand_token (m4 *context, m4_obstack *obs,
            || (symbol->value->type == M4_SYMBOL_FUNC
                && BIT_TEST (SYMBOL_FLAGS (symbol), VALUE_BLIND_ARGS_BIT)
                && !m4__next_token_is_open (context)))
-         {
-           m4_shipout_text (context, obs, text, strlen (text), line);
-         }
+         m4_shipout_text (context, obs, text, len, line);
        else
-         expand_macro (context, textp, symbol);
+         expand_macro (context, textp, len2, symbol);
       }
       break;
 
@@ -245,7 +249,7 @@ expand_argument (m4 *context, m4_obstack *obs, 
m4_symbol_value *argp,
    until a call to collect_arguments parses more tokens.  SYMBOL is
    the result of the symbol table lookup on NAME.  */
 static void
-expand_macro (m4 *context, const char *name, m4_symbol *symbol)
+expand_macro (m4 *context, const char *name, size_t len, m4_symbol *symbol)
 {
   char *argc_base = NULL;      /* Base of argc_stack on entry.  */
   unsigned int argc_size;      /* Size of argc_stack on entry.  */
@@ -298,8 +302,8 @@ recursion limit of %zu exceeded, use -L<N> to change it"),
   if (traced && m4_is_debug_bit (context, M4_DEBUG_TRACE_CALL))
     trace_prepre (context, name, my_call_id, value);
 
-  argv = collect_arguments (context, name, symbol, &argv_stack, argv_size,
-                           &argc_stack);
+  argv = collect_arguments (context, name, len, symbol, &argv_stack,
+                           argv_size, &argc_stack);
   /* Calling collect_arguments invalidated name, but we copied it as
      argv[0].  */
   name = argv->argv0;
@@ -336,14 +340,15 @@ recursion limit of %zu exceeded, use -L<N> to change it"),
 }
 
 /* Collect all the arguments to a call of the macro SYMBOL (called
-   NAME).  The arguments are stored on the obstack ARGUMENTS and a
-   table of pointers to the arguments on the obstack ARGPTR.  ARGPTR
-   is an incomplete object, currently occupying ARGV_BASE bytes.
-   Return the object describing all of the macro arguments.  */
+   NAME, with length LEN).  The arguments are stored on the obstack
+   ARGUMENTS and a table of pointers to the arguments on the obstack
+   ARGPTR.  ARGPTR is an incomplete object, currently occupying
+   ARGV_BASE bytes.  Return the object describing all of the macro
+   arguments.  */
 static m4_macro_args *
-collect_arguments (m4 *context, const char *name, m4_symbol *symbol,
-                  m4_obstack *argptr, unsigned int argv_base,
-                  m4_obstack *arguments)
+collect_arguments (m4 *context, const char *name, size_t len,
+                  m4_symbol *symbol, m4_obstack *argptr,
+                  unsigned int argv_base, m4_obstack *arguments)
 {
   m4_symbol_value token;
   m4_symbol_value *tokenp;
@@ -356,7 +361,8 @@ collect_arguments (m4 *context, const char *name, m4_symbol 
*symbol,
 
   args.argc = 1;
   args.inuse = false;
-  args.argv0 = (char *) obstack_copy0 (arguments, name, strlen (name));
+  args.argv0 = (char *) obstack_copy0 (arguments, name, len);
+  args.argv0_len = len;
   args.arraylen = 0;
   obstack_grow (argptr, &args, offsetof (m4_macro_args, array));
   name = args.argv0;
@@ -460,7 +466,8 @@ process_macro (m4 *context, m4_symbol_value *value, 
m4_obstack *obs,
              text = endp;
            }
          if (i < argc)
-           m4_shipout_string (context, obs, M4ARG (i), 0, false);
+           m4_shipout_string (context, obs, M4ARG (i), m4_arg_len (argv, i),
+                              false);
          break;
 
        case '#':               /* number of arguments */
@@ -505,14 +512,9 @@ process_macro (m4 *context, m4_symbol_value *value, 
m4_obstack *obs,
                  if (arg)
                    {
                      i = SYMBOL_ARG_INDEX (*arg);
-
-                     if (i < argc)
-                       m4_shipout_string (context, obs, M4ARG (i), 0, false);
-                     else
-                       {
-                         assert (!"INTERNAL ERROR: out of range reference");
-                         abort ();
-                       }
+                     assert (i < argc);
+                     m4_shipout_string (context, obs, M4ARG (i),
+                                        m4_arg_len (argv, i), false);
                    }
                }
              else
@@ -549,6 +551,8 @@ trace_format (m4 *context, const char *fmt, ...)
   va_list args;
   char ch;
   const char *s;
+  char nbuf[INT_BUFSIZE_BOUND (sizeof (int) > sizeof (size_t)
+                              ? sizeof (int) : sizeof (size_t))];
 
   va_start (args, fmt);
 
@@ -569,7 +573,6 @@ trace_format (m4 *context, const char *fmt, ...)
        case 'd':
          {
            int d = va_arg (args, int);
-           char nbuf[INT_BUFSIZE_BOUND (int)];
 
            sprintf (nbuf, "%d", d);
            s = nbuf;
@@ -581,7 +584,6 @@ trace_format (m4 *context, const char *fmt, ...)
          assert (ch == 'u');
          {
            size_t z = va_arg (args, size_t);
-           char nbuf[INT_BUFSIZE_BOUND (size_t)];
 
            sprintf (nbuf, "%zu", z);
            s = nbuf;
@@ -630,7 +632,7 @@ trace_flush (m4 *context)
   obstack_free (&context->trace_messages, line);
 }
 
-/* Do pre-argument-collction tracing for macro NAME.  Used from
+/* Do pre-argument-collection tracing for macro NAME.  Used from
    expand_macro ().  */
 static void
 trace_prepre (m4 *context, const char *name, size_t id, m4_symbol_value *value)
@@ -749,14 +751,13 @@ m4_arg_text (m4_macro_args *argv, unsigned int index)
 size_t
 m4_arg_len (m4_macro_args *argv, unsigned int index)
 {
-  /* TODO - update m4_macro_args to cache this.  */
   if (index == 0)
-    return strlen (argv->argv0);
+    return argv->argv0_len;
   if (argv->argc <= index)
     return 0;
   if (!m4_is_symbol_value_text (argv->array[index - 1]))
     return SIZE_MAX;
-  return strlen (m4_get_symbol_value_text (argv->array[index - 1]));
+  return m4_get_symbol_value_len (argv->array[index - 1]);
 }
 
 /* Given ARGV, return the builtin function referenced by argument
diff --git a/m4/output.c b/m4/output.c
index 3eb7758..ed2a451 100644
--- a/m4/output.c
+++ b/m4/output.c
@@ -579,20 +579,23 @@ void
 m4_shipout_int (m4_obstack *obs, int val)
 {
   char buf[INT_BUFSIZE_BOUND (int)];
-
-  sprintf(buf, "%d", val);
-  obstack_grow (obs, buf, strlen (buf));
+  int len = sprintf(buf, "%d", val);
+  obstack_grow (obs, buf, len);
 }
 
+/* Output the text S, of length LEN, to OBS.  If QUOTED, also output
+   current quote characters around S.  If LEN is SIZE_MAX, use the
+   string length of S instead.  */
 void
 m4_shipout_string (m4 *context, m4_obstack *obs, const char *s, size_t len,
                   bool quoted)
 {
+  assert (obs);
   if (s == NULL)
     s = "";
 
-  if (len == 0)
-    len = strlen(s);
+  if (len == SIZE_MAX)
+    len = strlen (s);
 
   if (quoted)
     obstack_grow (obs, context->syntax->lquote.string,
diff --git a/m4/symtab.c b/m4/symtab.c
index 7c253a0..2f83f7b 100644
--- a/m4/symtab.c
+++ b/m4/symtab.c
@@ -485,15 +485,23 @@ m4_symbol_value_print (m4_symbol_value *value, m4_obstack 
*obs, bool quote,
 {
   const char *text;
   size_t len;
+  bool truncated = false;
 
   if (m4_is_symbol_value_text (value))
     {
       text = m4_get_symbol_value_text (value);
+      len = m4_get_symbol_value_len (value);
+      if (arg_length && arg_length < len)
+       {
+         len = arg_length;
+         truncated = true;
+       }
     }
   else if (m4_is_symbol_value_func (value))
     {
       const m4_builtin *bp = m4_get_symbol_value_builtin (value);
       text = bp->name;
+      len = strlen (text);
       lquote = "<";
       rquote = ">";
       quote = true;
@@ -502,6 +510,7 @@ m4_symbol_value_print (m4_symbol_value *value, m4_obstack 
*obs, bool quote,
     {
       text = m4_get_symbol_value_placeholder (value);
       /* FIXME - is it worth translating "placeholder for "?  */
+      len = strlen (text);
       lquote = "<placeholder for ";
       rquote = ">";
       quote = true;
@@ -512,11 +521,10 @@ m4_symbol_value_print (m4_symbol_value *value, m4_obstack 
*obs, bool quote,
       abort ();
     }
 
-  len = arg_length ? strnlen (text, arg_length) : strlen (text);
   if (quote)
     obstack_grow (obs, lquote, strlen (lquote));
   obstack_grow (obs, text, len);
-  if (len == arg_length && text[len] != '\0')
+  if (truncated)
     obstack_grow (obs, "...", 3);
   if (quote)
     obstack_grow (obs, rquote, strlen (rquote));
@@ -683,11 +691,12 @@ void
 m4_set_symbol_value_text (m4_symbol_value *value, const char *text, size_t len)
 {
   assert (value && text);
-  /* TODO - this assertion requires NUL-terminated text.  Do we want
-     to optimize memory usage and use purely length-based
-     manipulation, for one less byte per string?  Perhaps only without
-     NDEBUG?  */
-  assert (strlen (text) <= len);
+  /* TODO - this assertion enforces NUL-terminated text with no
+     intermediate NULs.  Do we want to optimize memory usage and use
+     purely length-based manipulation, for one less byte per string?
+     Perhaps only without NDEBUG?  Also, do we want to support
+     embedded NUL?  */
+  assert (strlen (text) == len);
 
   value->type = M4_SYMBOL_TEXT;
   value->u.u_t.text = text;
diff --git a/m4/utility.c b/m4/utility.c
index 53d2a18..72205a8 100644
--- a/m4/utility.c
+++ b/m4/utility.c
@@ -114,7 +114,8 @@ m4_dump_args (m4 *context, m4_obstack *obs, unsigned int 
start,
       else
        need_sep = true;
 
-      m4_shipout_string (context, obs, M4ARG (i), 0, quoted);
+      m4_shipout_string (context, obs, M4ARG (i), m4_arg_len (argv, i),
+                         quoted);
     }
 }
 
diff --git a/modules/gnu.c b/modules/gnu.c
index 560e0d5..bc34692 100644
--- a/modules/gnu.c
+++ b/modules/gnu.c
@@ -127,13 +127,13 @@ typedef struct {
 /* Storage for the cache of regular expressions.  */
 static m4_pattern_buffer regex_cache[REGEX_CACHE_SIZE];
 
-/* Compile a REGEXP using the RESYNTAX flavor, and return the buffer.
-   On error, report the problem on behalf of CALLER, and return
-   NULL.  */
+/* Compile a REGEXP of length LEN using the RESYNTAX flavor, and
+   return the buffer.  On error, report the problem on behalf of
+   CALLER, and return NULL.  */
 
 static m4_pattern_buffer *
 regexp_compile (m4 *context, const char *caller, const char *regexp,
-               int resyntax)
+               size_t len, int resyntax)
 {
   /* regex_cache is guaranteed to start life 0-initialized, which
      works in the algorithm below.
@@ -150,7 +150,6 @@ regexp_compile (m4 *context, const char *caller, const char 
*regexp,
   m4_pattern_buffer *victim;   /* cache slot to replace */
   unsigned victim_count;       /* track which victim to replace */
   struct re_pattern_buffer *pat;/* newly compiled regex */
-  size_t len = strlen (regexp);        /* regex length */
 
   /* First, check if REGEXP is already cached with the given RESYNTAX.
      If so, increase its use count and return it.  */
@@ -214,7 +213,7 @@ regexp_compile (m4 *context, const char *caller, const char 
*regexp,
 /* Wrap up GNU Regex re_search call to work with an m4_pattern_buffer.
    If NO_SUB, then storing matches in buf->regs is not necessary.  */
 
-static int
+static regoff_t
 regexp_search (m4_pattern_buffer *buf, const char *string, const int size,
               const int start, const int range, bool no_sub)
 {
@@ -282,23 +281,22 @@ substitute (m4 *context, m4_obstack *obs, const char 
*caller,
    by regexp_compile) in VICTIM, substitute REPLACE.  Non-matching
    characters are copied verbatim, and the result copied to the
    obstack.  Errors are reported on behalf of CALLER.  Return true if
-   a substitution was made.  If IGNORE_DUPLICATES is set, don't worry
-   about completing the obstack when returning false.  */
+   a substitution was made.  If OPTIMIZE is set, don't worry about
+   copying the input if no changes are made.  */
 
 static bool
 regexp_substitute (m4 *context, m4_obstack *obs, const char *caller,
-                  const char *victim, const char *regexp,
+                  const char *victim, size_t len, const char *regexp,
                   m4_pattern_buffer *buf, const char *replace,
-                  bool ignore_duplicates)
+                  bool optimize)
 {
-  int matchpos = 0;            /* start position of match */
-  int offset   = 0;            /* current match offset */
-  int length   = strlen (victim);
-  bool subst   = false;        /* if a substitution has been made */
+  regoff_t matchpos = 0;       /* start position of match */
+  size_t offset = 0;           /* current match offset */
+  bool subst = !optimize;      /* if a substitution has been made */
 
-  while (offset <= length)
+  while (offset <= len)
     {
-      matchpos = regexp_search (buf, victim, length, offset, length - offset,
+      matchpos = regexp_search (buf, victim, len, offset, len - offset,
                                false);
 
       if (matchpos < 0)
@@ -311,8 +309,8 @@ regexp_substitute (m4 *context, m4_obstack *obs, const char 
*caller,
          if (matchpos == -2)
            m4_error (context, 0, 0, caller,
                      _("error matching regular expression `%s'"), regexp);
-         else if (!ignore_duplicates && (offset < length))
-           obstack_grow (obs, victim + offset, length - offset);
+         else if (offset < len && subst)
+           obstack_grow (obs, victim + offset, len - offset);
          break;
        }
 
@@ -333,14 +331,12 @@ regexp_substitute (m4 *context, m4_obstack *obs, const 
char *caller,
       offset = buf->regs.end[0];
       if (buf->regs.start[0] == buf->regs.end[0])
        {
-         obstack_1grow (obs, victim[offset]);
+         if (offset < len)
+           obstack_1grow (obs, victim[offset]);
          offset++;
        }
     }
 
-  if (!ignore_duplicates || subst)
-    obstack_1grow (obs, '\0');
-
   return subst;
 }
 
@@ -370,7 +366,8 @@ M4FINISH_HANDLER(gnu)
  **/
 M4BUILTIN_HANDLER (__file__)
 {
-  m4_shipout_string (context, obs, m4_get_current_file (context), 0, true);
+  m4_shipout_string (context, obs, m4_get_current_file (context), SIZE_MAX,
+                    true);
 }
 
 
@@ -388,7 +385,7 @@ M4BUILTIN_HANDLER (__line__)
  **/
 M4BUILTIN_HANDLER (__program__)
 {
-  m4_shipout_string (context, obs, m4_get_program_name (), 0, true);
+  m4_shipout_string (context, obs, m4_get_program_name (), SIZE_MAX, true);
 }
 
 
@@ -456,6 +453,7 @@ M4BUILTIN_HANDLER (builtin)
              new_argv->argc = argc - 1;
              new_argv->inuse = false;
              new_argv->argv0 = name;
+             new_argv->argv0_len = m4_arg_len (argv, 1);
              new_argv->arraylen = argc - 2;
              memcpy (&new_argv->array[0], &argv->array[1],
                      (argc - 2) * sizeof (m4_symbol_value *));
@@ -701,6 +699,7 @@ M4BUILTIN_HANDLER (indir)
          new_argv->argc = argc - 1;
          new_argv->inuse = false;
          new_argv->argv0 = name;
+         new_argv->argv0_len = m4_arg_len (argv, 1);
          new_argv->arraylen = argc - 2;
          memcpy (&new_argv->array[0], &argv->array[1],
                  (argc - 2) * sizeof (m4_symbol_value *));
@@ -727,7 +726,8 @@ M4BUILTIN_HANDLER (mkdtemp)
   M4_MODULE_IMPORT (m4, m4_make_temp);
 
   if (m4_make_temp)
-    m4_make_temp (context, obs, M4ARG (0), M4ARG (1), true);
+    m4_make_temp (context, obs, M4ARG (0), M4ARG (1), m4_arg_len (argv, 1),
+                 true);
   else
     assert (!"Unable to import from m4 module");
 }
@@ -767,17 +767,16 @@ M4BUILTIN_HANDLER (patsubst)
      replacement, we need not waste time with it.  */
   if (!*pattern && !*replace)
     {
-      const char *str = M4ARG (1);
-      obstack_grow (obs, str, strlen (str));
+      obstack_grow (obs, M4ARG (1), m4_arg_len (argv, 1));
       return;
     }
 
-  buf = regexp_compile (context, me, pattern, resyntax);
+  buf = regexp_compile (context, me, pattern, m4_arg_len (argv, 2), resyntax);
   if (!buf)
     return;
 
-  regexp_substitute (context, obs, me, M4ARG (1), pattern, buf,
-                    replace, false);
+  regexp_substitute (context, obs, me, M4ARG (1), m4_arg_len (argv, 1),
+                    pattern, buf, replace, false);
 }
 
 
@@ -797,8 +796,8 @@ M4BUILTIN_HANDLER (regexp)
   const char *pattern;         /* regular expression */
   const char *replace;         /* optional replacement string */
   m4_pattern_buffer *buf;      /* compiled regular expression */
-  int startpos;                        /* start position of match */
-  int length;                  /* length of first argument */
+  regoff_t startpos;           /* start position of match */
+  size_t len;                  /* length of first argument */
   int resyntax;
 
   me = M4ARG (0);
@@ -842,13 +841,12 @@ M4BUILTIN_HANDLER (regexp)
       return;
     }
 
-  buf = regexp_compile (context, me, pattern, resyntax);
+  buf = regexp_compile (context, me, pattern, m4_arg_len (argv, 2), resyntax);
   if (!buf)
     return;
 
-  length = strlen (M4ARG (1));
-  startpos = regexp_search (buf, M4ARG (1), length, 0, length,
-                           replace == NULL);
+  len = m4_arg_len (argv, 1);
+  startpos = regexp_search (buf, M4ARG (1), len, 0, len, replace == NULL);
 
   if (startpos == -2)
     {
@@ -899,7 +897,8 @@ M4BUILTIN_HANDLER (renamesyms)
            return;
        }
 
-      buf = regexp_compile (context, me, regexp, resyntax);
+      buf = regexp_compile (context, me, regexp, m4_arg_len (argv, 1),
+                           resyntax);
       if (!buf)
        return;
 
@@ -912,12 +911,12 @@ M4BUILTIN_HANDLER (renamesyms)
        {
          const char *name = data.base[0];
 
-         if (regexp_substitute (context, &rename_obs, me, name, regexp,
-                                buf, replace, true))
+         if (regexp_substitute (context, &rename_obs, me, name, strlen (name),
+                                regexp, buf, replace, true))
            {
-             const char *renamed = obstack_finish (&rename_obs);
-
-             m4_symbol_rename (M4SYMTAB, name, renamed);
+             obstack_1grow (&rename_obs, '\0');
+             m4_symbol_rename (M4SYMTAB, name,
+                               (char *) obstack_finish (&rename_obs));
            }
        }
 
@@ -949,7 +948,7 @@ M4BUILTIN_HANDLER (m4symbols)
 
       for (; data.size > 0; --data.size, data.base++)
        {
-         m4_shipout_string (context, obs, data.base[0], 0, true);
+         m4_shipout_string (context, obs, data.base[0], SIZE_MAX, true);
          if (data.size > 1)
            obstack_1grow (obs, ',');
        }
diff --git a/modules/load.c b/modules/load.c
index 11f9ecf..4ee1cc6 100644
--- a/modules/load.c
+++ b/modules/load.c
@@ -100,7 +100,8 @@ M4BUILTIN_HANDLER (m4modules)
   if (module)
     do
       {
-       m4_shipout_string (context, obs, m4_get_module_name (module), 0, true);
+       m4_shipout_string (context, obs, m4_get_module_name (module), SIZE_MAX,
+                          true);
 
        if ((module = m4__module_next (module)))
          obstack_1grow (obs, ',');
diff --git a/modules/m4.c b/modules/m4.c
index 87584a2..827fabb 100644
--- a/modules/m4.c
+++ b/modules/m4.c
@@ -54,7 +54,7 @@ extern void m4_dump_symbols  (m4 *context, 
m4_dump_symbol_data *data,
                              bool complain);
 extern const char *m4_expand_ranges (const char *s, m4_obstack *obs);
 extern void m4_make_temp     (m4 *context, m4_obstack *obs, const char *macro,
-                             const char *name, bool dir);
+                             const char *name, size_t len, bool dir);
 
 /* stdlib--.h defines mkstemp to a safer replacement, but this
    interferes with our preprocessor table of builtin definitions.  */
@@ -229,26 +229,13 @@ M4BUILTIN_HANDLER (popdef)
 
 M4BUILTIN_HANDLER (ifdef)
 {
-  m4_symbol *symbol;
-  const char *result;
-
-  symbol = m4_symbol_lookup (M4SYMTAB, M4ARG (1));
-
-  if (symbol)
-    result = M4ARG (2);
-  else if (argc >= 4)
-    result = M4ARG (3);
-  else
-    result = NULL;
-
-  if (result)
-    obstack_grow (obs, result, strlen (result));
+  unsigned int index = m4_symbol_lookup (M4SYMTAB, M4ARG (1)) ? 2 : 3;
+  obstack_grow (obs, M4ARG (index), m4_arg_len (argv, index));
 }
 
 M4BUILTIN_HANDLER (ifelse)
 {
   const char *me = M4ARG (0);
-  const char *result;
   unsigned int index;
 
   /* The valid ranges of argc for ifelse is discontinuous, we cannot
@@ -265,13 +252,13 @@ M4BUILTIN_HANDLER (ifelse)
   index = 1;
   argc--;
 
-  result = NULL;
-  while (result == NULL)
-
-    if (strcmp (M4ARG (index), M4ARG (index + 1)) == 0)
-      result = M4ARG (index + 2);
-
-    else
+  while (1)
+    {
+      if (strcmp (M4ARG (index), M4ARG (index + 1)) == 0)
+       {
+         obstack_grow (obs, M4ARG (index + 2), m4_arg_len (argv, index + 2));
+         return;
+       }
       switch (argc)
        {
        case 3:
@@ -279,15 +266,14 @@ M4BUILTIN_HANDLER (ifelse)
 
        case 4:
        case 5:
-         result = M4ARG (index + 3);
-         break;
+         obstack_grow (obs, M4ARG (index + 3), m4_arg_len (argv, index + 3));
+         return;
 
        default:
          argc -= 3;
          index += 3;
        }
-
-  obstack_grow (obs, result, strlen (result));
+    }
 }
 
 
@@ -407,7 +393,8 @@ M4BUILTIN_HANDLER (defn)
       if (!symbol)
        m4_warn (context, 0, me, _("undefined macro `%s'"), name);
       else if (m4_is_symbol_text (symbol))
-       m4_shipout_string (context, obs, m4_get_symbol_text (symbol), 0, true);
+       m4_shipout_string (context, obs, m4_get_symbol_text (symbol),
+                          m4_get_symbol_len (symbol), true);
       else if (m4_is_symbol_func (symbol))
        m4_push_builtin (context, m4_get_symbol_value (symbol));
       else if (m4_is_symbol_placeholder (symbol))
@@ -578,15 +565,11 @@ M4BUILTIN_HANDLER (decr)
 M4BUILTIN_HANDLER (divert)
 {
   int i = 0;
-  const char *text;
 
   if (argc >= 2 && !m4_numeric_arg (context, M4ARG (0), M4ARG (1), &i))
     return;
-
   m4_make_diversion (context, i);
-
-  text = M4ARG (2);
-  m4_shipout_text (context, NULL, text, strlen (text),
+  m4_shipout_text (context, NULL, M4ARG (2), m4_arg_len (argv, 2),
                   m4_get_current_line (context));
 }
 
@@ -717,10 +700,9 @@ M4BUILTIN_HANDLER (sinclude)
    export this function as a helper to that?  */
 void
 m4_make_temp (m4 *context, m4_obstack *obs, const char *macro,
-             const char *name, bool dir)
+             const char *name, size_t len, bool dir)
 {
   int fd;
-  int len;
   int i;
 
   if (m4_get_safer_opt (context))
@@ -732,14 +714,11 @@ m4_make_temp (m4 *context, m4_obstack *obs, const char 
*macro,
   /* Guarantee that there are six trailing 'X' characters, even if the
      user forgot to supply them.  */
   assert (obstack_object_size (obs) == 0);
-  len = strlen (name);
   obstack_grow (obs, name, len);
   for (i = 0; len > 0 && i < 6; i++)
     if (name[--len] != 'X')
       break;
-  for (; i < 6; i++)
-    obstack_1grow (obs, 'X');
-  obstack_1grow (obs, '\0');
+  obstack_grow0 (obs, "XXXXXX", 6 - i);
 
   /* Make the temporary object.  */
   errno = 0;
@@ -755,8 +734,16 @@ m4_make_temp (m4 *context, m4_obstack *obs, const char 
*macro,
                name);
       obstack_free (obs, obstack_finish (obs));
     }
-  else if (! dir)
-    close (fd);
+  else
+    {
+      if (! dir)
+       close (fd);
+      /* Undo the trailing NUL.  */
+      /* FIXME - shouldn't this return a quoted string, on the rather
+        small chance that the user has a macro matching the random
+        file name chosen?  */
+      obstack_blank (obs, -1);
+    }
 }
 
 /* Use the first argument as at template for a temporary file name.  */
@@ -776,7 +763,7 @@ M4BUILTIN_HANDLER (maketemp)
           maketemp(XXXXXXXX) -> `X00nnnnn', where nnnnn is 16-bit pid
       */
       const char *str = M4ARG (1);
-      int len = strlen (str);
+      size_t len = m4_arg_len (argv, 1);
       int i;
       int len2;
 
@@ -787,22 +774,24 @@ M4BUILTIN_HANDLER (maketemp)
       str = ntoa ((number) getpid (), 10);
       len2 = strlen (str);
       if (len2 > len - i)
-       obstack_grow0 (obs, str + len2 - (len - i), len - i);
+       obstack_grow (obs, str + len2 - (len - i), len - i);
       else
        {
          while (i++ < len - len2)
            obstack_1grow (obs, '0');
-         obstack_grow0 (obs, str, len2);
+         obstack_grow (obs, str, len2);
        }
     }
   else
-    m4_make_temp (context, obs, M4ARG (0), M4ARG (1), false);
+    m4_make_temp (context, obs, M4ARG (0), M4ARG (1), m4_arg_len (argv, 1),
+                 false);
 }
 
 /* Use the first argument as a template for a temporary file name.  */
 M4BUILTIN_HANDLER (mkstemp)
 {
-  m4_make_temp (context, obs, M4ARG (0), M4ARG (1), false);
+  m4_make_temp (context, obs, M4ARG (0), M4ARG (1), m4_arg_len (argv, 1),
+               false);
 }
 
 /* Print all arguments on standard error.  */
@@ -862,7 +851,7 @@ M4BUILTIN_HANDLER (m4wrap)
 {
   assert (obstack_object_size (obs) == 0);
   if (m4_get_posixly_correct_opt (context))
-    m4_shipout_string (context, obs, M4ARG (1), 0, false);
+    m4_shipout_string (context, obs, M4ARG (1), m4_arg_len (argv, 1), false);
   else
     m4_dump_args (context, obs, 1, argv, " ", false);
   obstack_1grow (obs, '\0');
@@ -906,7 +895,7 @@ M4BUILTIN_HANDLER (traceoff)
 /* Expand to the length of the first argument.  */
 M4BUILTIN_HANDLER (len)
 {
-  m4_shipout_int (obs, strlen (M4ARG (1)));
+  m4_shipout_int (obs, m4_arg_len (argv, 1));
 }
 
 /* The macro expands to the first index of the second argument in the first
@@ -946,11 +935,11 @@ M4BUILTIN_HANDLER (substr)
 
   if (argc <= 2)
     {
-      obstack_grow (obs, str, strlen (str));
+      obstack_grow (obs, str, m4_arg_len (argv, 1));
       return;
     }
 
-  length = avail = strlen (str);
+  length = avail = m4_arg_len (argv, 1);
   if (!m4_numeric_arg (context, me, M4ARG (2), &start))
     return;
 
@@ -1174,6 +1163,8 @@ static void
 numb_obstack(m4_obstack *obs, number value, int radix, int min)
 {
   const char *s;
+  size_t len;
+
   if (radix == 1)
     {
       /* FIXME - this code currently depends on undefined behavior.  */
@@ -1186,7 +1177,6 @@ numb_obstack(m4_obstack *obs, number value, int radix, 
int min)
        obstack_1grow (obs, '0');
       while (value-- != 0)
        obstack_1grow (obs, '1');
-      obstack_1grow (obs, '\0');
       return;
     }
 
@@ -1197,10 +1187,11 @@ numb_obstack(m4_obstack *obs, number value, int radix, 
int min)
       obstack_1grow (obs, '-');
       s++;
     }
-  for (min -= strlen (s); --min >= 0;)
+  len = strlen (s);
+  for (min -= len; --min >= 0;)
     obstack_1grow (obs, '0');
 
-  obstack_grow (obs, s, strlen (s));
+  obstack_grow (obs, s, len);
 }
 
 
diff --git a/modules/m4.h b/modules/m4.h
index 4783d0a..81dfef4 100644
--- a/modules/m4.h
+++ b/modules/m4.h
@@ -43,7 +43,8 @@ typedef void m4_dump_symbols_func (m4 *context, 
m4_dump_symbol_data *data,
                                   bool complain);
 typedef const char *m4_expand_ranges_func (const char *s, m4_obstack *obs);
 typedef void m4_make_temp_func (m4 *context, m4_obstack *obs,
-                               const char *macro, const char *name, bool dir);
+                               const char *macro, const char *name,
+                                size_t len, bool dir);
 
 END_C_DECLS
 
diff --git a/modules/mpeval.c b/modules/mpeval.c
index a702752..e70eb09 100644
--- a/modules/mpeval.c
+++ b/modules/mpeval.c
@@ -174,6 +174,7 @@ numb_obstack (m4_obstack *obs, const number value, const 
int radix,
              int min)
 {
   const char *s;
+  size_t len;
 
   mpz_t i;
   mpz_init (i);
@@ -186,10 +187,11 @@ numb_obstack (m4_obstack *obs, const number value, const 
int radix,
       obstack_1grow (obs, '-');
       s++;
     }
-  for (min -= strlen (s); --min >= 0;)
+  len = strlen (s);
+  for (min -= len; --min >= 0;)
     obstack_1grow (obs, '0');
 
-  obstack_grow (obs, s, strlen (s));
+  obstack_grow (obs, s, len);
 
   mpq_get_den (i, value);
   if (mpz_cmp_si (i, (long) 1) != 0)
diff --git a/modules/perl.c b/modules/perl.c
index 58161b1..f129af0 100644
--- a/modules/perl.c
+++ b/modules/perl.c
@@ -123,6 +123,6 @@ M4BUILTIN_HANDLER (perleval)
 
       val = perl_eval_pv (M4ARG (i), true);
 
-      m4_shipout_string (context, obs, SvPV (val, PL_na), 0, false);
+      m4_shipout_string (context, obs, SvPV (val, PL_na), SIZE_MAX, false);
     }
 }
diff --git a/modules/stdlib.c b/modules/stdlib.c
index 62fdae7..0063ed8 100644
--- a/modules/stdlib.c
+++ b/modules/stdlib.c
@@ -80,13 +80,14 @@ m4_builtin m4_builtin_table[] =
  **/
 M4BUILTIN_HANDLER (getcwd)
 {
+  /* FIXME - Use gnulib module for arbitrary-length cwd.  */
   char buf[1024];
   char *bp;
 
   bp = getcwd (buf, sizeof buf);
 
   if (bp != NULL)              /* in case of error return null string */
-    m4_shipout_string (context, obs, buf, 0, false);
+    m4_shipout_string (context, obs, buf, SIZE_MAX, false);
 }
 
 /**
@@ -99,7 +100,7 @@ M4BUILTIN_HANDLER (getenv)
   env = getenv (M4ARG (1));
 
   if (env != NULL)
-    m4_shipout_string (context, obs, env, 0, false);
+    m4_shipout_string (context, obs, env, SIZE_MAX, false);
 }
 
 /**
@@ -121,10 +122,9 @@ M4BUILTIN_HANDLER (setenv)
     return;
 
   assert (obstack_object_size (obs) == 0);
-  obstack_grow (obs, M4ARG (1), strlen (M4ARG (1)));
+  obstack_grow (obs, M4ARG (1), m4_arg_len (argv, 1));
   obstack_1grow (obs, '=');
-  obstack_grow (obs, M4ARG (2), strlen (M4ARG (2)));
-  obstack_1grow (obs, '\0');
+  obstack_grow0 (obs, M4ARG (2), m4_arg_len (argv, 2));
 
   {
     char *env = obstack_finish (obs);
@@ -155,7 +155,7 @@ M4BUILTIN_HANDLER (getlogin)
   login = getlogin ();
 
   if (login != NULL)
-    m4_shipout_string (context, obs, login, 0, false);
+    m4_shipout_string (context, obs, login, SIZE_MAX, false);
 }
 
 /**
@@ -185,19 +185,19 @@ M4BUILTIN_HANDLER (getpwnam)
 
   if (pw != NULL)
     {
-      m4_shipout_string (context, obs, pw->pw_name, 0, true);
+      m4_shipout_string (context, obs, pw->pw_name, SIZE_MAX, true);
       obstack_1grow (obs, ',');
-      m4_shipout_string (context, obs, pw->pw_passwd, 0, true);
+      m4_shipout_string (context, obs, pw->pw_passwd, SIZE_MAX, true);
       obstack_1grow (obs, ',');
       m4_shipout_int (obs, pw->pw_uid);
       obstack_1grow (obs, ',');
       m4_shipout_int (obs, pw->pw_gid);
       obstack_1grow (obs, ',');
-      m4_shipout_string (context, obs, pw->pw_gecos, 0, true);
+      m4_shipout_string (context, obs, pw->pw_gecos, SIZE_MAX, true);
       obstack_1grow (obs, ',');
-      m4_shipout_string (context, obs, pw->pw_dir, 0, true);
+      m4_shipout_string (context, obs, pw->pw_dir, SIZE_MAX, true);
       obstack_1grow (obs, ',');
-      m4_shipout_string (context, obs, pw->pw_shell, 0, true);
+      m4_shipout_string (context, obs, pw->pw_shell, SIZE_MAX, true);
     }
 }
 
@@ -216,19 +216,19 @@ M4BUILTIN_HANDLER (getpwuid)
 
   if (pw != NULL)
     {
-      m4_shipout_string (context, obs, pw->pw_name, 0, true);
+      m4_shipout_string (context, obs, pw->pw_name, SIZE_MAX, true);
       obstack_1grow (obs, ',');
-      m4_shipout_string (context, obs, pw->pw_passwd, 0, true);
+      m4_shipout_string (context, obs, pw->pw_passwd, SIZE_MAX, true);
       obstack_1grow (obs, ',');
       m4_shipout_int (obs, pw->pw_uid);
       obstack_1grow (obs, ',');
       m4_shipout_int (obs, pw->pw_gid);
       obstack_1grow (obs, ',');
-      m4_shipout_string (context, obs, pw->pw_gecos, 0, true);
+      m4_shipout_string (context, obs, pw->pw_gecos, SIZE_MAX, true);
       obstack_1grow (obs, ',');
-      m4_shipout_string (context, obs, pw->pw_dir, 0, true);
+      m4_shipout_string (context, obs, pw->pw_dir, SIZE_MAX, true);
       obstack_1grow (obs, ',');
-      m4_shipout_string (context, obs, pw->pw_shell, 0, true);
+      m4_shipout_string (context, obs, pw->pw_shell, SIZE_MAX, true);
     }
 }
 
@@ -242,7 +242,7 @@ M4BUILTIN_HANDLER (hostname)
   if (gethostname (buf, sizeof buf) < 0)
     return;
 
-  m4_shipout_string (context, obs, buf, 0, false);
+  m4_shipout_string (context, obs, buf, SIZE_MAX, false);
 }
 
 /**
@@ -280,15 +280,15 @@ M4BUILTIN_HANDLER (uname)
 
   if (uname (&ut) == 0)
     {
-      m4_shipout_string (context, obs, ut.sysname, 0, true);
+      m4_shipout_string (context, obs, ut.sysname, SIZE_MAX, true);
       obstack_1grow (obs, ',');
-      m4_shipout_string (context, obs, ut.nodename, 0, true);
+      m4_shipout_string (context, obs, ut.nodename, SIZE_MAX, true);
       obstack_1grow (obs, ',');
-      m4_shipout_string (context, obs, ut.release, 0, true);
+      m4_shipout_string (context, obs, ut.release, SIZE_MAX, true);
       obstack_1grow (obs, ',');
-      m4_shipout_string (context, obs, ut.version, 0, true);
+      m4_shipout_string (context, obs, ut.version, SIZE_MAX, true);
       obstack_1grow (obs, ',');
-      m4_shipout_string (context, obs, ut.machine, 0, true);
+      m4_shipout_string (context, obs, ut.machine, SIZE_MAX, true);
     }
 }
 
diff --git a/src/freeze.c b/src/freeze.c
index c17f4f3..8430dda 100644
--- a/src/freeze.c
+++ b/src/freeze.c
@@ -168,7 +168,7 @@ dump_symbol_CB (m4_symbol_table *symtab, const char 
*symbol_name,
   if (m4_is_symbol_text (symbol))
     {
       const char *text = m4_get_symbol_text (symbol);
-      size_t text_len = strlen (text);
+      size_t text_len = m4_get_symbol_len (symbol);
       xfprintf (file, "T%zu,%zu", symbol_len, text_len);
       if (module)
        xfprintf (file, ",%zu", module_len);
diff --git a/tests/builtins.at b/tests/builtins.at
index a23aaa9..eeaf0d3 100644
--- a/tests/builtins.at
+++ b/tests/builtins.at
@@ -526,6 +526,8 @@ AT_CLEANUP
 
 AT_SETUP([mkstemp])
 
+AT_KEYWORDS([maketemp])
+
 dnl Check that on error, the expansion is void
 AT_DATA([[in]],
 [[mkstemp(`no_such_dir/m4-fooXXXXXX')
@@ -534,6 +536,12 @@ AT_CHECK_M4([in], [1], [[
 ]], [[m4:in:1: mkstemp: cannot create file from template 
`no_such_dir/m4-fooXXXXXX': No such file or directory
 ]])
 
+dnl Check that extra X are appended, but not trailing NUL
+AT_DATA([[in]], [[len(mkstemp(`m4-fooXXXXX'))
+]])
+AT_CHECK_M4([in], [0], [[12
+]])
+
 dnl Check that umask has an effect
 AT_DATA([[in]],
 [[substr(esyscmd(`ls -ld 'mkstemp(`m4-fooXXXXXX')), `0', `10')
@@ -546,7 +554,7 @@ AT_CHECK([umask 700; m4 < in], [0], [[----------
 dnl Check for Solaris compatibility of maketemp.  Hopefully the pid is
 dnl less than 20 decimal digits.  Also check that --safer does not affect
 dnl traditional behavior of maketemp, which is textual only.
-AT_DATA([[stdin]],
+AT_DATA([[in]],
 [[maketemp()
 maketemp(X)
 maketemp(XX)
@@ -555,7 +563,7 @@ maketemp(no_such_dir/XXXXXX)
 ]])
 dnl Abuse our knowledge of AT_CHECK_M4 so that we can get stderr filtering...
 AT_CHECK_M4([-G -Q --safer], [0], [stdout], [],
-[stdin& echo $! > pid; wait $!])
+[in& echo $! > pid; wait $!])
 pid=`cat pid`
 cat >expout <<EOF
 
diff --git a/tests/others.at b/tests/others.at
index b50e338..dd1e381 100644
--- a/tests/others.at
+++ b/tests/others.at
@@ -260,6 +260,10 @@ AT_SETUP([iso8859])
 # is no use in trying to handle it here...  Well, until autom4te provides
 # us with means to.
 
+# However, since progress is being made to let M4 handle NUL, this test
+# is xfailed for now.
+AT_XFAIL_IF([:])
+
 AT_DATA([[expout]],
 [[# Testing quotes
 DEFINE                 # eol
-- 
1.5.3.5

>From 07108982f9559379e136462c08509777ddeaaec0 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Fri, 19 Oct 2007 10:13:06 -0600
Subject: [PATCH] Stage 3: cache length, rather than computing it

* src/input.c (next_token): Grab length from obstack rather than
calling strlen.
* src/m4.h (token_data, macro_arguments): Add length field.
(TOKEN_DATA_LEN): New accessor.
(define_user_macro): Add parameter.
* src/builtin.c (define_user_macro, mkstemp_helper): Use
pre-computed length.
(builtin_init, define_macro, m4_maketemp, m4_mkstemp): Adjust
callers.
(dump_args, m4_ifdef, m4_ifelse, m4_builtin, m4_indir, m4_eval)
(m4_len, m4_substr, m4_translit, m4_regexp, m4_patsubst)
(expand_user_macro): Use cached lengths.
* src/freeze.c (reload_frozen_state): Adjust callers.
* src/m4.c (main): Likewise.
* src/macro.c (expand_token, expand_argument, collect_arguments)
(arg_len): Use cached length.
* doc/m4.texinfo (Mkstemp): Ensure mkstemp does not produce NUL.

(cherry picked from commit cd50e094b5f49104f66ff807c0a01d2f20c61c7f)

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog      |   21 +++++++++++
 doc/m4.texinfo |   18 ++++++++-
 src/builtin.c  |  105 ++++++++++++++++++++++++++++++++++---------------------
 src/freeze.c   |    3 +-
 src/input.c    |   37 +++++++++++--------
 src/m4.c       |    3 +-
 src/m4.h       |    8 ++++-
 src/macro.c    |   18 +++++-----
 8 files changed, 143 insertions(+), 70 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 3301ac0..7cd6fc8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,24 @@
+2007-11-29  Eric Blake  <address@hidden>
+
+       Stage 3: cache length, rather than computing it.
+       * src/input.c (next_token): Grab length from obstack rather than
+       calling strlen.
+       * src/m4.h (token_data, macro_arguments): Add length field.
+       (TOKEN_DATA_LEN): New accessor.
+       (define_user_macro): Add parameter.
+       * src/builtin.c (define_user_macro, mkstemp_helper): Use
+       pre-computed length.
+       (builtin_init, define_macro, m4_maketemp, m4_mkstemp): Adjust
+       callers.
+       (dump_args, m4_ifdef, m4_ifelse, m4_builtin, m4_indir, m4_eval)
+       (m4_len, m4_substr, m4_translit, m4_regexp, m4_patsubst)
+       (expand_user_macro): Use cached lengths.
+       * src/freeze.c (reload_frozen_state): Adjust callers.
+       * src/m4.c (main): Likewise.
+       * src/macro.c (expand_token, expand_argument, collect_arguments)
+       (arg_len): Use cached length.
+       * doc/m4.texinfo (Mkstemp): Ensure mkstemp does not produce NUL.
+
 2007-11-27  Eric Blake  <address@hidden>
 
        Stage 2: use accessors, not direct reference, into argv.
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 3cc3539..3da16fc 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -5880,8 +5880,8 @@ recommend that you use the new @code{mkstemp} macro, 
introduced in
 
 @example
 $ @kbd{m4}
-syscmd(`echo foo??????')dnl
address@hidden
+syscmd(`rm -f foo??????')sysval
address@hidden
 define(`file1', maketemp(`fooXXXXXX'))dnl
 ifelse(esyscmd(`echo foo??????'), `foo??????', `no file', `created')
 @result{}created
@@ -5901,6 +5901,20 @@ sysval
 @result{}0
 @end example
 
address@hidden
address@hidden Not worth documenting, but make sure we don't leave trailing NUL 
in
address@hidden the expansion.
+
address@hidden
+syscmd(`rm -f foo??????')sysval
address@hidden
+len(mkstemp(`fooXXXXX'))
address@hidden
+syscmd(`rm foo??????')sysval
address@hidden
address@hidden example
address@hidden ignore
+
 @node Miscellaneous
 @chapter Miscellaneous builtin macros
 
diff --git a/src/builtin.c b/src/builtin.c
index fbfc2fe..e719cdd 100644
--- a/src/builtin.c
+++ b/src/builtin.c
@@ -407,7 +407,8 @@ free_regex (void)
 `-------------------------------------------------------------------------*/
 
 void
-define_user_macro (const char *name, const char *text, symbol_lookup mode)
+define_user_macro (const char *name, size_t len, const char *text,
+                  symbol_lookup mode)
 {
   symbol *s;
   char *defn = xstrdup (text ? text : "");
@@ -423,7 +424,6 @@ define_user_macro (const char *name, const char *text, 
symbol_lookup mode)
   if (macro_sequence_inuse && text)
     {
       regoff_t offset = 0;
-      size_t len = strlen (defn);
 
       while ((offset = re_search (&macro_sequence_buf, defn, len, offset,
                                  len - offset, &macro_sequence_regs)) >= 0)
@@ -479,12 +479,14 @@ builtin_init (void)
     if (no_gnu_extensions)
       {
        if (pp->unix_name != NULL)
-         define_user_macro (pp->unix_name, pp->func, SYMBOL_INSERT);
+         define_user_macro (pp->unix_name, strlen (pp->unix_name),
+                            pp->func, SYMBOL_INSERT);
       }
     else
       {
        if (pp->gnu_name != NULL)
-         define_user_macro (pp->gnu_name, pp->func, SYMBOL_INSERT);
+         define_user_macro (pp->gnu_name, strlen (pp->gnu_name),
+                            pp->func, SYMBOL_INSERT);
       }
 }
 
@@ -621,7 +623,7 @@ dump_args (struct obstack *obs, int start, macro_arguments 
*argv,
        dump_sep = true;
       if (quoted)
        obstack_grow (obs, lquote.string, lquote.length);
-      obstack_grow (obs, ARG (i), strlen (ARG (i)));
+      obstack_grow (obs, ARG (i), arg_len (argv, i));
       if (quoted)
        obstack_grow (obs, rquote.string, rquote.length);
     }
@@ -665,14 +667,14 @@ define_macro (int argc, macro_arguments *argv, 
symbol_lookup mode)
 
   if (argc == 2)
     {
-      define_user_macro (ARG (1), "", mode);
+      define_user_macro (ARG (1), arg_len (argv, 1), "", mode);
       return;
     }
 
   switch (arg_type (argv, 2))
     {
     case TOKEN_TEXT:
-      define_user_macro (ARG (1), ARG (2), mode);
+      define_user_macro (ARG (1), arg_len (argv, 1), ARG (2), mode);
       break;
 
     case TOKEN_FUNC:
@@ -730,20 +732,27 @@ m4_ifdef (struct obstack *obs, int argc, macro_arguments 
*argv)
 {
   symbol *s;
   const char *result;
+  size_t len = 0;
 
   if (bad_argc (ARG (0), argc, 2, 3))
     return;
   s = lookup_symbol (ARG (1), SYMBOL_LOOKUP);
 
   if (s != NULL && SYMBOL_TYPE (s) != TOKEN_VOID)
-    result = ARG (2);
+    {
+      result = ARG (2);
+      len = arg_len (argv, 2);
+    }
   else if (argc >= 4)
-    result = ARG (3);
+    {
+      result = ARG (3);
+      len = arg_len (argv, 3);
+    }
   else
     result = NULL;
 
   if (result != NULL)
-    obstack_grow (obs, result, strlen (result));
+    obstack_grow (obs, result, len);
 }
 
 static void
@@ -752,6 +761,7 @@ m4_ifelse (struct obstack *obs, int argc, macro_arguments 
*argv)
   const char *result;
   const char *me;
   int index;
+  size_t len = 0;
 
   if (argc == 2)
     return;
@@ -769,8 +779,12 @@ m4_ifelse (struct obstack *obs, int argc, macro_arguments 
*argv)
   result = NULL;
   while (result == NULL)
 
-    if (strcmp (ARG (index), ARG (index + 1)) == 0)
-      result = ARG (index + 2);
+    if (arg_len (argv, index) == arg_len (argv, index + 1)
+       && strcmp (ARG (index), ARG (index + 1)) == 0)
+      {
+       result = ARG (index + 2);
+       len = arg_len (argv, index + 2);
+      }
 
     else
       switch (argc)
@@ -781,6 +795,7 @@ m4_ifelse (struct obstack *obs, int argc, macro_arguments 
*argv)
        case 4:
        case 5:
          result = ARG (index + 3);
+         len = arg_len (argv, index + 3);
          break;
 
        default:
@@ -788,7 +803,7 @@ m4_ifelse (struct obstack *obs, int argc, macro_arguments 
*argv)
          index += 3;
        }
 
-  obstack_grow (obs, result, strlen (result));
+  obstack_grow (obs, result, len);
 }
 
 /*---------------------------------------------------------------------.
@@ -939,6 +954,7 @@ m4_builtin (struct obstack *obs, int argc, macro_arguments 
*argv)
       new_argv->argc = argc - 1;
       new_argv->inuse = false;
       new_argv->argv0 = name;
+      new_argv->argv0_len = arg_len (argv, 1);
       new_argv->arraylen = argc - 2;
       memcpy (&new_argv->array[0], &argv->array[1],
              (argc - 2) * sizeof (token_data *));
@@ -992,6 +1008,7 @@ m4_indir (struct obstack *obs, int argc, macro_arguments 
*argv)
       new_argv->argc = argc - 1;
       new_argv->inuse = false;
       new_argv->argv0 = name;
+      new_argv->argv0_len = arg_len (argv, 1);
       new_argv->arraylen = argc - 2;
       memcpy (&new_argv->array[0], &argv->array[1],
              (argc - 2) * sizeof (token_data *));
@@ -1169,6 +1186,7 @@ m4_eval (struct obstack *obs, int argc, macro_arguments 
*argv)
   int radix = 10;
   int min = 1;
   const char *s;
+  size_t len;
 
   if (bad_argc (me, argc, 1, 3))
     return;
@@ -1176,7 +1194,7 @@ m4_eval (struct obstack *obs, int argc, macro_arguments 
*argv)
   if (*ARG (2) && !numeric_arg (me, ARG (2), &radix))
     return;
 
-  if (radix < 1 || radix > (int) strlen (digits))
+  if (radix < 1 || radix > 36)
     {
       m4_warn (0, me, _("radix %d out of range"), radix);
       return;
@@ -1218,10 +1236,11 @@ m4_eval (struct obstack *obs, int argc, macro_arguments 
*argv)
       obstack_1grow (obs, '-');
       s++;
     }
-  for (min -= strlen (s); --min >= 0;)
+  len = strlen (s);
+  for (min -= len; --min >= 0;)
     obstack_1grow (obs, '0');
 
-  obstack_grow (obs, s, strlen (s));
+  obstack_grow (obs, s, len);
 }
 
 static void
@@ -1466,18 +1485,18 @@ m4_sinclude (struct obstack *obs, int argc, 
macro_arguments *argv)
 | Use the first argument as a template for a temporary file name.  |
 `-----------------------------------------------------------------*/
 
-/* Add trailing 'X' to NAME if necessary, securely create the file,
-   and place the new file name on OBS.  Report errors on behalf of ME.  */
+/* Add trailing 'X' to NAME of length LEN as necessary, then securely
+   create the file, and place the new file name on OBS.  Report errors
+   on behalf of ME.  */
 static void
-mkstemp_helper (struct obstack *obs, const char *me, const char *name)
+mkstemp_helper (struct obstack *obs, const char *me, const char *name,
+               size_t len)
 {
   int fd;
-  int len;
   int i;
 
   /* Guarantee that there are six trailing 'X' characters, even if the
      user forgot to supply them.  */
-  len = strlen (name);
   obstack_grow (obs, name, len);
   for (i = 0; len > 0 && i < 6; i++)
     if (name[--len] != 'X')
@@ -1494,7 +1513,13 @@ mkstemp_helper (struct obstack *obs, const char *me, 
const char *name)
       obstack_free (obs, obstack_finish (obs));
     }
   else
-    close (fd);
+    {
+      close (fd);
+      /* Undo trailing NUL.  */
+      /* FIXME - should we be quoting this name, on the tiny chance
+        that the random name generated matches a user's macro?  */
+      obstack_blank (obs, -1);
+    }
 }
 
 static void
@@ -1518,9 +1543,9 @@ m4_maketemp (struct obstack *obs, int argc, 
macro_arguments *argv)
           maketemp(XXXXXXXX) -> `X00nnnnn', where nnnnn is 16-bit pid
       */
       const char *str = ARG (1);
-      int len = strlen (str);
-      int i;
-      int len2;
+      size_t len = arg_len (argv, 1);
+      size_t i;
+      size_t len2;
 
       m4_warn (0, me, _("recommend using mkstemp instead"));
       for (i = len; i > 1; i--)
@@ -1535,11 +1560,11 @@ m4_maketemp (struct obstack *obs, int argc, 
macro_arguments *argv)
        {
          while (i++ < len - len2)
            obstack_1grow (obs, '0');
-         obstack_grow0 (obs, str, len2);
+         obstack_grow (obs, str, len2);
        }
     }
   else
-    mkstemp_helper (obs, me, ARG (1));
+    mkstemp_helper (obs, me, ARG (1), arg_len (argv, 1));
 }
 
 static void
@@ -1549,7 +1574,7 @@ m4_mkstemp (struct obstack *obs, int argc, 
macro_arguments *argv)
 
   if (bad_argc (me, argc, 1, 1))
     return;
-  mkstemp_helper (obs, me, ARG (1));
+  mkstemp_helper (obs, me, ARG (1), arg_len (argv, 1));
 }
 
 /*----------------------------------------.
@@ -1641,7 +1666,7 @@ m4_m4wrap (struct obstack *obs, int argc, macro_arguments 
*argv)
   if (bad_argc (ARG (0), argc, 1, -1))
     return;
   if (no_gnu_extensions)
-    obstack_grow (obs, ARG (1), strlen (ARG (1)));
+    obstack_grow (obs, ARG (1), arg_len (argv, 1));
   else
     dump_args (obs, 1, argv, " ", false);
   obstack_1grow (obs, '\0');
@@ -1788,7 +1813,7 @@ m4_len (struct obstack *obs, int argc, macro_arguments 
*argv)
 {
   if (bad_argc (ARG (0), argc, 1, 1))
     return;
-  shipout_int (obs, strlen (ARG (1)));
+  shipout_int (obs, arg_len (argv, 1));
 }
 
 /*-------------------------------------------------------------------------.
@@ -1848,11 +1873,11 @@ m4_substr (struct obstack *obs, int argc, 
macro_arguments *argv)
     {
       /* builtin(`substr') is blank, but substr(`abc') is abc.  */
       if (argc == 2)
-       obstack_grow (obs, ARG (1), strlen (ARG (1)));
+       obstack_grow (obs, ARG (1), arg_len (argv, 1));
       return;
     }
 
-  length = avail = strlen (ARG (1));
+  length = avail = arg_len (argv, 1);
   if (!numeric_arg (me, ARG (2), &start))
     return;
 
@@ -1934,7 +1959,7 @@ m4_translit (struct obstack *obs, int argc, 
macro_arguments *argv)
     {
       /* builtin(`translit') is blank, but translit(`abc') is abc.  */
       if (argc == 2)
-       obstack_grow (obs, ARG (1), strlen (ARG (1)));
+       obstack_grow (obs, ARG (1), arg_len (argv, 1));
       return;
     }
 
@@ -2124,14 +2149,14 @@ m4_regexp (struct obstack *obs, int argc, 
macro_arguments *argv)
              argc == 3 ? "" : "{", repl, argc == 3 ? "" : "}");
 #endif /* DEBUG_REGEX */
 
-  msg = compile_pattern (regexp, strlen (regexp), &buf, &regs);
+  msg = compile_pattern (regexp, arg_len (argv, 2), &buf, &regs);
   if (msg != NULL)
     {
       m4_warn (0, me, _("bad regular expression: `%s': %s"), regexp, msg);
       return;
     }
 
-  length = strlen (victim);
+  length = arg_len (argv, 1);
   /* Avoid overhead of allocating regs if we won't use it.  */
   startpos = re_search (buf, victim, length, 0, length,
                        argc == 3 ? NULL : regs);
@@ -2171,7 +2196,7 @@ m4_patsubst (struct obstack *obs, int argc, 
macro_arguments *argv)
     {
       /* builtin(`patsubst') is blank, but patsubst(`abc') is abc.  */
       if (argc == 2)
-       obstack_grow (obs, ARG (1), strlen (ARG (1)));
+       obstack_grow (obs, ARG (1), arg_len (argv, 1));
       return;
     }
 
@@ -2183,7 +2208,7 @@ m4_patsubst (struct obstack *obs, int argc, 
macro_arguments *argv)
      replacement, we need not waste time with it.  */
   if (!*regexp && !*repl)
     {
-      obstack_grow (obs, victim, strlen (victim));
+      obstack_grow (obs, victim, arg_len (argv, 1));
       return;
     }
 
@@ -2192,14 +2217,14 @@ m4_patsubst (struct obstack *obs, int argc, 
macro_arguments *argv)
     xfprintf (trace_file, "p:{%s}:{%s}\n", regexp, repl);
 #endif /* DEBUG_REGEX */
 
-  msg = compile_pattern (regexp, strlen (regexp), &buf, &regs);
+  msg = compile_pattern (regexp, arg_len (argv, 2), &buf, &regs);
   if (msg != NULL)
     {
       m4_warn (0, me, _("bad regular expression `%s': %s"), regexp, msg);
       return;
     }
 
-  length = strlen (victim);
+  length = arg_len (argv, 1);
 
   offset = 0;
   matchpos = 0;
@@ -2302,7 +2327,7 @@ expand_user_macro (struct obstack *obs, symbol *sym,
                i = i * 10 + (*text - '0');
            }
          if (i < argc)
-           obstack_grow (obs, ARG (i), strlen (ARG (i)));
+           obstack_grow (obs, ARG (i), arg_len (argv, i));
          break;
 
        case '#':               /* number of arguments */
diff --git a/src/freeze.c b/src/freeze.c
index 16a4ed2..52a69d1 100644
--- a/src/freeze.c
+++ b/src/freeze.c
@@ -349,7 +349,8 @@ reload_frozen_state (const char *name)
 
              /* Enter a macro having an expansion text as a definition.  */
 
-             define_user_macro (string[0], string[1], SYMBOL_PUSHDEF);
+             define_user_macro (string[0], number[0], string[1],
+                                SYMBOL_PUSHDEF);
              break;
 
            case 'Q':
diff --git a/src/input.c b/src/input.c
index 3d96ec7..0aa6036 100644
--- a/src/input.c
+++ b/src/input.c
@@ -59,8 +59,8 @@
    accordingly.  */
 
 #ifdef ENABLE_CHANGEWORD
-#include "regex.h"
-#endif
+# include "regex.h"
+#endif /* ENABLE_CHANGEWORD */
 
 enum input_type
 {
@@ -164,7 +164,7 @@ static bool pop_input (bool);
 
 #ifdef DEBUG_INPUT
 static const char *token_type_string (token_type);
-#endif
+#endif /* DEBUG_INPUT */
 
 
 /*-------------------------------------------------------------------.
@@ -688,7 +688,7 @@ input_init (void)
 
 #ifdef ENABLE_CHANGEWORD
   set_word_regexp (NULL, user_word_regexp);
-#endif
+#endif /* ENABLE_CHANGEWORD */
 }
 
 
@@ -827,7 +827,7 @@ next_token (token_data *td, int *line, const char *caller)
 #ifdef ENABLE_CHANGEWORD
   int startpos;
   char *orig_text = NULL;
-#endif
+#endif /* ENABLE_CHANGEWORD */
   const char *file;
   int dummy;
 
@@ -841,7 +841,7 @@ next_token (token_data *td, int *line, const char *caller)
     {
 #ifdef DEBUG_INPUT
       xfprintf (stderr, "next_token -> EOF\n");
-#endif
+#endif /* DEBUG_INPUT */
       next_char ();
       return TOKEN_EOF;
     }
@@ -852,7 +852,7 @@ next_token (token_data *td, int *line, const char *caller)
 #ifdef DEBUG_INPUT
       xfprintf (stderr, "next_token -> MACDEF (%s)\n",
                find_builtin_by_addr (TOKEN_DATA_FUNC (td))->name);
-#endif
+#endif /* DEBUG_INPUT */
       return TOKEN_MACDEF;
     }
 
@@ -973,19 +973,24 @@ next_token (token_data *td, int *line, const char *caller)
       type = TOKEN_STRING;
     }
 
-  obstack_1grow (&token_stack, '\0');
-
   TOKEN_DATA_TYPE (td) = TOKEN_TEXT;
+  TOKEN_DATA_LEN (td) = obstack_object_size (&token_stack);
+  obstack_1grow (&token_stack, '\0');
   TOKEN_DATA_TEXT (td) = (char *) obstack_finish (&token_stack);
 #ifdef ENABLE_CHANGEWORD
   if (orig_text == NULL)
-    orig_text = TOKEN_DATA_TEXT (td);
-  TOKEN_DATA_ORIG_TEXT (td) = orig_text;
-#endif
+    TOKEN_DATA_ORIG_TEXT (td) = TOKEN_DATA_TEXT (td);
+  else
+    {
+      TOKEN_DATA_ORIG_TEXT (td) = orig_text;
+      TOKEN_DATA_LEN (td) = strlen (orig_text);
+    }
+#endif /* ENABLE_CHANGEWORD */
 #ifdef DEBUG_INPUT
-  xfprintf (stderr, "next_token -> %s (%s)\n",
-           token_type_string (type), TOKEN_DATA_TEXT (td));
-#endif
+  xfprintf (stderr, "next_token -> %s (%s), len %zu\n",
+           token_type_string (type), TOKEN_DATA_TEXT (td),
+           TOKEN_DATA_LEN (td));
+#endif /* DEBUG_INPUT */
   return type;
 }
 
@@ -1115,4 +1120,4 @@ lex_debug (void)
   while ((t = next_token (&td, NULL, "<debug>")) != TOKEN_EOF)
     print_token ("lex", t, &td);
 }
-#endif
+#endif /* DEBUG_INPUT */
diff --git a/src/m4.c b/src/m4.c
index 217d389..0c7f33f 100644
--- a/src/m4.c
+++ b/src/m4.c
@@ -626,7 +626,8 @@ main (int argc, char *const *argv, char *const *envp)
            char *macro_value = strchr (macro_name, '=');
            if (macro_value)
              *macro_value++ = '\0';
-           define_user_macro (macro_name, macro_value, SYMBOL_INSERT);
+           define_user_macro (macro_name, strlen (macro_name),
+                              macro_value, SYMBOL_INSERT);
            free (macro_name);
          }
          break;
diff --git a/src/m4.h b/src/m4.h
index 522bda2..3a6acc3 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -284,6 +284,10 @@ struct token_data
     {
       struct
        {
+         /* We don't support NUL in text, yet.  So len is just a
+            cache for now.  But it will be essential if we ever DO
+            support NUL.  */
+         size_t len;
          char *text;
 #ifdef ENABLE_CHANGEWORD
          char *original_text;
@@ -313,6 +317,7 @@ struct macro_arguments
      until next byte read from file.  */
   bool inuse;
   const char *argv0; /* The macro name being expanded.  */
+  size_t argv0_len; /* Length of argv0.  */
   size_t arraylen; /* True length of allocated elements in array.  */
   /* Used as a variable-length array, storing information about each
      argument.  */
@@ -320,6 +325,7 @@ struct macro_arguments
 };
 
 #define TOKEN_DATA_TYPE(Td)            ((Td)->type)
+#define TOKEN_DATA_LEN(Td)             ((Td)->u.u_t.len)
 #define TOKEN_DATA_TEXT(Td)            ((Td)->u.u_t.text)
 #ifdef ENABLE_CHANGEWORD
 # define TOKEN_DATA_ORIG_TEXT(Td)      ((Td)->u.u_t.original_text)
@@ -472,7 +478,7 @@ void builtin_init (void);
 void define_builtin (const char *, const builtin *, symbol_lookup);
 void set_macro_sequence (const char *);
 void free_regex (void);
-void define_user_macro (const char *, const char *, symbol_lookup);
+void define_user_macro (const char *, size_t, const char *, symbol_lookup);
 void undivert_all (void);
 void expand_user_macro (struct obstack *, symbol *, int, macro_arguments *);
 void m4_placeholder (struct obstack *, int, macro_arguments *);
diff --git a/src/macro.c b/src/macro.c
index 9997ce1..320727d 100644
--- a/src/macro.c
+++ b/src/macro.c
@@ -95,8 +95,7 @@ expand_token (struct obstack *obs, token_type t, token_data 
*td, int line)
     case TOKEN_CLOSE:
     case TOKEN_SIMPLE:
     case TOKEN_STRING:
-      shipout_text (obs, TOKEN_DATA_TEXT (td), strlen (TOKEN_DATA_TEXT (td)),
-                   line);
+      shipout_text (obs, TOKEN_DATA_TEXT (td), TOKEN_DATA_LEN (td), line);
       break;
 
     case TOKEN_WORD:
@@ -108,11 +107,10 @@ expand_token (struct obstack *obs, token_type t, 
token_data *td, int line)
        {
 #ifdef ENABLE_CHANGEWORD
          shipout_text (obs, TOKEN_DATA_ORIG_TEXT (td),
-                       strlen (TOKEN_DATA_ORIG_TEXT (td)), line);
+                       TOKEN_DATA_LEN (td), line);
 #else
-         shipout_text (obs, TOKEN_DATA_TEXT (td),
-                       strlen (TOKEN_DATA_TEXT (td)), line);
-#endif
+         shipout_text (obs, TOKEN_DATA_TEXT (td), TOKEN_DATA_LEN (td), line);
+#endif /* !ENABLE_CHANGEWORD */
        }
       else
        expand_macro (sym);
@@ -183,6 +181,7 @@ expand_argument (struct obstack *obs, token_data *argp, 
const char *caller)
                    return t == TOKEN_COMMA;
                  warn_builtin_concat (caller, TOKEN_DATA_FUNC (argp));
                }
+             TOKEN_DATA_LEN (argp) = obstack_object_size (obs);
              obstack_1grow (obs, '\0');
              TOKEN_DATA_TYPE (argp) = TOKEN_TEXT;
              TOKEN_DATA_TEXT (argp) = (char *) obstack_finish (obs);
@@ -255,6 +254,7 @@ collect_arguments (symbol *sym, struct obstack *argptr, 
unsigned int argv_base,
   args.argc = 1;
   args.inuse = false;
   args.argv0 = SYMBOL_NAME (sym);
+  args.argv0_len = strlen (args.argv0);
   args.arraylen = 0;
   obstack_grow (argptr, &args, offsetof (macro_arguments, array));
 
@@ -269,6 +269,7 @@ collect_arguments (symbol *sym, struct obstack *argptr, 
unsigned int argv_base,
            {
              TOKEN_DATA_TYPE (&td) = TOKEN_TEXT;
              TOKEN_DATA_TEXT (&td) = (char *) "";
+             TOKEN_DATA_LEN (&td) = 0;
            }
          tdp = (token_data *) obstack_copy (arguments, &td, sizeof td);
          obstack_ptr_grow (argptr, tdp);
@@ -448,14 +449,13 @@ arg_text (macro_arguments *argv, unsigned int index)
 size_t
 arg_len (macro_arguments *argv, unsigned int index)
 {
-  /* TODO - update macro_arguments to cache this.  */
   if (index == 0)
-    return strlen (argv->argv0);
+    return argv->argv0_len;
   if (index >= argv->argc)
     return 0;
   if (TOKEN_DATA_TYPE (argv->array[index - 1]) != TOKEN_TEXT)
     return SIZE_MAX;
-  return strlen (TOKEN_DATA_TEXT (argv->array[index - 1]));
+  return TOKEN_DATA_LEN (argv->array[index - 1]);
 }
 
 /* Given ARGV, return the builtin function referenced by argument
-- 
1.5.3.5


reply via email to

[Prev in Thread] Current Thread [Next in Thread]