m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[13/18] argv_ref speedup: push composite text tokens


From: Eric Blake
Subject: [13/18] argv_ref speedup: push composite text tokens
Date: Sat, 26 Jan 2008 22:05:52 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071031 Thunderbird/2.0.0.9 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Next in the series.  Up to now, when a back-reference was created, it was
eventually copied and reverted back to text as soon as it was concatenated
with anything else, so that there were no references to references.  This
patch adds the ability to push composite tokens, thus allowing more
complex input reuse.  In the process of testing, I noticed that if
references are always created, that memory growth gets out of hand; so
this patch includes a heuristic that when pushing new text (ie. an
argument that contains no back-references), a reference is okay; but when
pushing a composite argument (which means at least one of the links in the
composite chain is already a back-reference), avoid creating a new
back-reference.  This hurts boxed recursion for now (since the theory of
boxed recursion is iterating over a progressively smaller substring of the
original argument), but improves memory usage in the common case.  And
although it adds a slight amount of time, it improves the framework so
that later patches can pass entire $@ in one go, rather than an argument
at a time.

2008-01-27  Eric Blake  <address@hidden>

        Stage 13: push composite text tokens.
        Support pushing composite tokens, allowing back-references to be
        reused through multiple macro expansions.  Add hueristic that
        avoids creating new reference when pushing existing references.
        Memory impact: noticeable improvement due to better reference
        reuse, except for boxed recursion doing more copying.
        Speed impact: slight penalty, due to more bookkeeping.
        * src/m4.h (push_token): Adjust prototype.
        * src/input.c (push_token): Add parameter, and handle composite
        tokens.
        (append_quote_token): Inline short strings.
        * src/macro.c (push_arg, push_args): Adjust callers.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHnBEv84KuGfSFAYARAis8AJwIWlNPYdiMtP3segJgLbPfE5zB4wCfY3Zg
h+DKdhzi+rfOApEz0OKDyvk=
=UDEG
-----END PGP SIGNATURE-----
>From ccc250d2238b57b313c54921b57fc078e5bb8220 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Sat, 26 Jan 2008 21:39:25 -0700
Subject: [PATCH] Stage 13: push composite text tokens.

* m4/m4private.h (m4__push_symbol): Adjust prototype.
* m4/input.c (m4__push_symbol): Add parameter, and support
composite tokens.
(append_quote_token): Add parameter, and support inlining of short
text.
(m4__next_token): Adjust caller.
* m4/macro.c (m4_push_arg, m4_push_args): Likewise.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog      |   17 ++++++
 m4/input.c     |  175 +++++++++++++++++++++++++++++++++++++++++++-------------
 m4/m4private.h |    3 +-
 m4/macro.c     |   38 ++-----------
 4 files changed, 160 insertions(+), 73 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 25c408e..34b9491 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,20 @@
+2008-01-27  Eric Blake  <address@hidden>
+
+       Stage 13: push composite text tokens.
+       Support pushing composite tokens, allowing back-references to be
+       reused through multiple macro expansions.  Add hueristic that
+       avoids creating new reference when pushing existing references.
+       Memory impact: noticeable improvement due to better reference
+       reuse, except for boxed recursion doing more copying.
+       Speed impact: slight penalty, due to more bookkeeping.
+       * m4/m4private.h (m4__push_symbol): Adjust prototype.
+       * m4/input.c (m4__push_symbol): Add parameter, and support
+       composite tokens.
+       (append_quote_token): Add parameter, and support inlining of short
+       text.
+       (m4__next_token): Adjust caller.
+       * m4/macro.c (m4_push_arg, m4_push_args): Likewise.
+
 2008-01-26  Eric Blake  <address@hidden>
 
        Stage 12c: add macro for m4_arg_len.
diff --git a/m4/input.c b/m4/input.c
index 4adce9d..9616d37 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -112,7 +112,8 @@ static      bool    composite_clean         (m4_input_block 
*, m4 *, bool);
 static void    composite_print         (m4_input_block *, m4 *, m4_obstack *);
 
 static void    init_builtin_token      (m4 *, m4_symbol_value *);
-static void    append_quote_token      (m4_obstack *, m4_symbol_value *);
+static void    append_quote_token      (m4 *, m4_obstack *,
+                                        m4_symbol_value *);
 static bool    match_input             (m4 *, const char *, bool);
 static int     next_char               (m4 *, bool, bool);
 static int     peek_char               (m4 *);
@@ -526,35 +527,72 @@ m4_push_string_init (m4 *context)
   return current_input;
 }
 
-/* If VALUE contains text, then convert the current string into a
+/* This function allows gathering input from multiple locations,
+   rather than copying everything consecutively onto the input stack.
+   Must be called between push_string_init and push_string_finish.
+
+   If VALUE contains text, then convert the current input block into a
    chain if it is not one already, and add the contents of VALUE as a
    new link in the chain.  LEVEL describes the current expansion
-   level, or SIZE_MAX if the contents of VALUE reside entirely on the
-   current_input stack and VALUE lives in temporary storage.  Allows
-   gathering input from multiple locations, rather than copying
-   everything consecutively onto the input stack.  Must be called
-   between push_string_init and push_string_finish.  Return true only
-   if LEVEL is less than SIZE_MAX and a reference was created to
-   VALUE, in which case, the lifetime of the contents of VALUE must
-   last as long as the input engine can parse references from it.  */
+   level, or SIZE_MAX if VALUE is composite, its contents reside
+   entirely on the current_input stack, and VALUE lives in temporary
+   storage.  If VALUE is a simple string, then it belongs to the
+   current macro expansion.  If VALUE is composit, then each text link
+   has a level of SIZE_MAX if it belongs to the current macro
+   expansion, otherwise it is a back-reference where level tracks
+   which stack it came from.  The resulting input block chain contains
+   links with a level of SIZE_MAX if the text belongs to the input
+   stack, otherwise the level where the back-reference comes from.
+
+   Return true only if a reference was created to the contents of
+   VALUE, in which case, LEVEL is less than SIZE_MAX and the lifetime
+   of VALUE and its contents must last as long as the input engine can
+   parse references from it.  INUSE determines whether composite
+   symbols should favor creating back-references or copying text.  */
 bool
-m4__push_symbol (m4 *context, m4_symbol_value *value, size_t level)
+m4__push_symbol (m4 *context, m4_symbol_value *value, size_t level, bool inuse)
 {
+  m4__symbol_chain *src_chain = NULL;
   m4__symbol_chain *chain;
-  bool result = false;
 
   assert (next);
-  /* TODO - also accept TOKEN_COMP chains.  */
-  assert (m4_is_symbol_value_text (value));
+  /* TODO - also accept composite chains with $@ refs.  */
 
   /* Speed consideration - for short enough symbols, the speed and
      memory overhead of parsing another INPUT_CHAIN link outweighs the
-     time to inline the symbol text.  */
-  if (m4_get_symbol_value_len (value) <= INPUT_INLINE_THRESHOLD)
+     time to inline the symbol text.  But don't copy text if it
+     already lives on the obstack.  */
+  if (m4_is_symbol_value_text (value))
     {
-      obstack_grow (current_input, m4_get_symbol_value_text (value),
-                   m4_get_symbol_value_len (value));
-      return false;
+      assert (level < SIZE_MAX);
+      if (m4_get_symbol_value_len (value) <= INPUT_INLINE_THRESHOLD)
+       {
+         obstack_grow (current_input, m4_get_symbol_value_text (value),
+                       m4_get_symbol_value_len (value));
+         return false;
+       }
+    }
+  else
+    {
+      /* For composite values, if argv is already in use, creating
+        additional references for long text segments is more
+        efficient in time.  But if argv is not yet in use, and we
+        have a composite value, then the value must already contain a
+        back-reference, and memory usage is more efficient if we can
+        avoid using the current expand_macro, even if it means larger
+        copies.  */
+      assert (value->type == M4_SYMBOL_COMP);
+      src_chain = value->u.u_c.chain;
+      while (level < SIZE_MAX && src_chain && src_chain->type == M4__CHAIN_STR
+            && (src_chain->u.u_s.len <= INPUT_INLINE_THRESHOLD
+                || (!inuse && src_chain->u.u_s.level == SIZE_MAX)))
+       {
+         obstack_grow (current_input, src_chain->u.u_s.str,
+                       src_chain->u.u_s.len);
+         src_chain = src_chain->next;
+       }
+      if (!src_chain)
+       return false;
     }
 
   if (next->funcs == &string_funcs)
@@ -563,24 +601,72 @@ m4__push_symbol (m4 *context, m4_symbol_value *value, 
size_t level)
       next->u.u_c.chain = next->u.u_c.end = NULL;
     }
   m4__make_text_link (current_input, &next->u.u_c.chain, &next->u.u_c.end);
-  chain = (m4__symbol_chain *) obstack_alloc (current_input, sizeof *chain);
-  if (next->u.u_c.end)
-    next->u.u_c.end->next = chain;
-  else
-    next->u.u_c.chain = chain;
-  next->u.u_c.end = chain;
-  chain->next = NULL;
-  chain->type = M4__CHAIN_STR;
-  chain->quote_age = m4_get_symbol_value_quote_age (value);
-  chain->u.u_s.str = m4_get_symbol_value_text (value);
-  chain->u.u_s.len = m4_get_symbol_value_len (value);
-  chain->u.u_s.level = level;
-  if (level < SIZE_MAX)
+  if (m4_is_symbol_value_text (value))
     {
+      chain = (m4__symbol_chain *) obstack_alloc (current_input,
+                                                 sizeof *chain);
+      if (next->u.u_c.end)
+       next->u.u_c.end->next = chain;
+      else
+       next->u.u_c.chain = chain;
+      next->u.u_c.end = chain;
+      chain->next = NULL;
+      chain->type = M4__CHAIN_STR;
+      chain->quote_age = m4_get_symbol_value_quote_age (value);
+      chain->u.u_s.str = m4_get_symbol_value_text (value);
+      chain->u.u_s.len = m4_get_symbol_value_len (value);
+      chain->u.u_s.level = level;
       m4__adjust_refcount (context, level, true);
-      result = true;
+      inuse = true;
     }
-  return result;
+  while (src_chain)
+    {
+      if (level == SIZE_MAX)
+       {
+         /* Nothing to copy, since link already lives on obstack.  */
+         assert (src_chain->type != M4__CHAIN_STR
+                 || src_chain->u.u_s.level == SIZE_MAX);
+         chain = src_chain;
+       }
+      else
+       {
+         /* Allow inlining the final link with subsequent text.  */
+         if (!src_chain->next && src_chain->type == M4__CHAIN_STR
+             && (src_chain->u.u_s.len <= INPUT_INLINE_THRESHOLD
+                 || (!inuse && src_chain->u.u_s.level == SIZE_MAX)))
+           {
+             obstack_grow (current_input, src_chain->u.u_s.str,
+                           src_chain->u.u_s.len);
+             break;
+           }
+         /* We must clone each link in the chain, since next_char
+            destructively modifies the chain it is parsing.  */
+         chain = (m4__symbol_chain *) obstack_copy (current_input, src_chain,
+                                                    sizeof *chain);
+         if (chain->type == M4__CHAIN_STR && chain->u.u_s.level == SIZE_MAX)
+           {
+             if (chain->u.u_s.len <= INPUT_INLINE_THRESHOLD || !inuse)
+               chain->u.u_s.str = (char *) obstack_copy (current_input,
+                                                         chain->u.u_s.str,
+                                                         chain->u.u_s.len);
+             else
+               {
+                 chain->u.u_s.level = level;
+                 inuse = true;
+               }
+           }
+       }
+      if (next->u.u_c.end)
+       next->u.u_c.end->next = chain;
+      else
+       next->u.u_c.chain = chain;
+      next->u.u_c.end = chain;
+      assert (chain->type == M4__CHAIN_STR);
+      if (chain->u.u_s.level < SIZE_MAX)
+       m4__adjust_refcount (context, chain->u.u_s.level, true);
+      src_chain = src_chain->next;
+    }
+  return inuse;
 }
 
 /* Last half of m4_push_string ().  If next is now NULL, a call to
@@ -925,11 +1011,23 @@ init_builtin_token (m4 *context, m4_symbol_value *token)
    as the quoted token from the top of the input stack.  Use OBS for
    any additional allocations needed to store the token chain.  */
 static void
-append_quote_token (m4_obstack *obs, m4_symbol_value *value)
+append_quote_token (m4 *context, m4_obstack *obs, m4_symbol_value *value)
 {
   m4__symbol_chain *src_chain = isp->u.u_c.chain;
   m4__symbol_chain *chain;
-  assert (isp->funcs == &composite_funcs && obs);
+  assert (isp->funcs == &composite_funcs && obs && m4__quote_age (M4SYNTAX)
+         && src_chain->type == M4__CHAIN_STR
+         && src_chain->u.u_s.level <= SIZE_MAX);
+  isp->u.u_c.chain = src_chain->next;
+
+  /* Speed consideration - for short enough symbols, the speed and
+     memory overhead of parsing another INPUT_CHAIN link outweighs the
+     time to inline the symbol text.  */
+  if (src_chain->u.u_s.len <= INPUT_INLINE_THRESHOLD)
+    {
+      obstack_grow (obs, src_chain->u.u_s.str, src_chain->u.u_s.len);
+      m4__adjust_refcount (context, src_chain->u.u_s.level, false);
+    }
 
   if (value->type == M4_SYMBOL_VOID)
     {
@@ -944,8 +1042,7 @@ append_quote_token (m4_obstack *obs, m4_symbol_value 
*value)
   else
     value->u.u_c.chain = chain;
   value->u.u_c.end = chain;
-  value->u.u_c.end->next = NULL;
-  isp->u.u_c.chain = src_chain->next;
+  chain->next = NULL;
 }
 
 
@@ -1293,7 +1390,7 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
              m4_error_at_line (context, EXIT_FAILURE, 0, file, *line, caller,
                                _("end of file in string"));
            if (ch == CHAR_QUOTE)
-             append_quote_token (obs, token);
+             append_quote_token (context, obs, token);
            else if (m4_has_syntax (M4SYNTAX, ch, M4_SYNTAX_RQUOTE))
              {
                if (--quote_level == 0)
diff --git a/m4/m4private.h b/m4/m4private.h
index 4261c4c..5304682 100644
--- a/m4/m4private.h
+++ b/m4/m4private.h
@@ -475,7 +475,8 @@ typedef enum {
 
 extern void            m4__make_text_link (m4_obstack *, m4__symbol_chain **,
                                            m4__symbol_chain **);
-extern bool            m4__push_symbol (m4 *, m4_symbol_value *, size_t);
+extern bool            m4__push_symbol (m4 *, m4_symbol_value *, size_t,
+                                        bool);
 extern m4__token_type  m4__next_token (m4 *, m4_symbol_value *, int *,
                                        m4_obstack *, const char *);
 extern bool            m4__next_token_is_open (m4 *);
diff --git a/m4/macro.c b/m4/macro.c
index 88ee391..f91923c 100644
--- a/m4/macro.c
+++ b/m4/macro.c
@@ -1319,23 +1319,9 @@ m4_push_arg (m4 *context, m4_obstack *obs, m4_macro_args 
*argv,
        return;
     }
   /* TODO handle builtin tokens?  */
-  if (value->type == M4_SYMBOL_TEXT)
-    {
-      if (m4__push_symbol (context, value, context->expansion_level - 1))
-       arg_mark (argv);
-    }
-  else if (value->type == M4_SYMBOL_COMP)
-    {
-      /* TODO - really handle composites; for now, just flatten the
-        composite and push its text.  */
-      m4__symbol_chain *chain = value->u.u_c.chain;
-      while (chain)
-       {
-         assert (chain->type == M4__CHAIN_STR);
-         obstack_grow (obs, chain->u.u_s.str, chain->u.u_s.len);
-         chain = chain->next;
-       }
-    }
+  if (m4__push_symbol (context, value, context->expansion_level - 1,
+                      argv->inuse))
+    arg_mark (argv);
 }
 
 /* Push series of comma-separated arguments from ARGV, which should
@@ -1347,7 +1333,6 @@ m4_push_args (m4 *context, m4_obstack *obs, m4_macro_args 
*argv, bool skip,
              bool quote)
 {
   m4_symbol_value *value;
-  m4__symbol_chain *chain;
   unsigned int i = skip ? 2 : 1;
   const char *sep = ",";
   size_t sep_len = 1;
@@ -1389,21 +1374,8 @@ m4_push_args (m4 *context, m4_obstack *obs, 
m4_macro_args *argv, bool skip,
       else
        use_sep = true;
       /* TODO handle builtin tokens?  */
-      if (value->type == M4_SYMBOL_TEXT)
-       inuse |= m4__push_symbol (context, value,
-                                 context->expansion_level - 1);
-      else
-       {
-         /* TODO handle composite text.  */
-         assert (value->type == M4_SYMBOL_COMP);
-         chain = value->u.u_c.chain;
-         while (chain)
-           {
-             assert (chain->type == M4__CHAIN_STR);
-             obstack_grow (obs, chain->u.u_s.str, chain->u.u_s.len);
-             chain = chain->next;
-           }
-       }
+      inuse |= m4__push_symbol (context, value,
+                               context->expansion_level - 1, inuse);
     }
   if (quote)
     obstack_grow (obs, quotes->str2, quotes->len2);
-- 
1.5.3.8

>From 951a0fb343bc2dc1d20109fc532a7af320ff70e0 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Tue, 30 Oct 2007 11:17:51 -0600
Subject: [PATCH] Stage 13: push composite text tokens.

* src/m4.h (push_token): Adjust prototype.
* src/input.c (push_token): Add parameter, and handle composite
tokens.
(append_quote_token): Inline short strings.
* src/macro.c (push_arg, push_args): Adjust callers.

(cherry picked from commit 290301246eefb3f58fe29b8ccd9118b23c76c61c)

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog   |   15 +++++
 src/input.c |  176 +++++++++++++++++++++++++++++++++++++++++++++-------------
 src/m4.h    |    2 +-
 src/macro.c |   38 ++-----------
 4 files changed, 157 insertions(+), 74 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 7213833..3ffea53 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,18 @@
+2008-01-27  Eric Blake  <address@hidden>
+
+       Stage 13: push composite text tokens.
+       Support pushing composite tokens, allowing back-references to be
+       reused through multiple macro expansions.  Add hueristic that
+       avoids creating new reference when pushing existing references.
+       Memory impact: noticeable improvement due to better reference
+       reuse, except for boxed recursion doing more copying.
+       Speed impact: slight penalty, due to more bookkeeping.
+       * src/m4.h (push_token): Adjust prototype.
+       * src/input.c (push_token): Add parameter, and handle composite
+       tokens.
+       (append_quote_token): Inline short strings.
+       * src/macro.c (push_arg, push_args): Adjust callers.
+
 2008-01-26  Eric Blake  <address@hidden>
 
        Stage 12: make token_chain a union, add string_pair.
diff --git a/src/input.c b/src/input.c
index 5890bd2..514acd1 100644
--- a/src/input.c
+++ b/src/input.c
@@ -313,37 +313,74 @@ push_string_init (void)
   return current_input;
 }
 
-/*-------------------------------------------------------------------.
-| If TOKEN contains text, then convert the current string into a     |
-| chain if it is not one already, and add the contents of TOKEN as a |
-| new link in the chain.  LEVEL describes the current expansion      |
-| level, or -1 if the contents of TOKEN reside entirely on the       |
-| current_input stack and TOKEN lives in temporary storage.  Allows  |
-| gathering input from multiple locations, rather than copying       |
-| everything consecutively onto the input stack.  Must be called     |
-| between push_string_init and push_string_finish.  Return true only |
-| if LEVEL is non-negative, and a reference was created to TOKEN, in |
-| which case, the lifetime of TOKEN and its contents must last as    |
-| long as the input engine can parse references to it.               |
-`-------------------------------------------------------------------*/
+/*--------------------------------------------------------------------.
+| This function allows gathering input from multiple locations,              |
+| rather than copying everything consecutively onto the input stack.  |
+| Must be called between push_string_init and push_string_finish.     |
+|                                                                     |
+| If TOKEN contains text, then convert the current input block into   |
+| a chain if it is not one already, and add the contents of TOKEN as  |
+| a new link in the chain.  LEVEL describes the current expansion     |
+| level, or -1 if TOKEN is composite, its contents reside entirely    |
+| on the current_input stack, and TOKEN lives in temporary storage.   |
+| If TOKEN is a simple string, then it belongs to the current macro   |
+| expansion.  If TOKEN is composite, then each text link has a level  |
+| of -1 if it belongs to the current macro expansion, otherwise it    |
+| is a back-reference where level tracks which stack it came from.    |
+| The resulting input block chain contains links with a level of -1   |
+| if the text belongs to the input stack, otherwise the level where   |
+| the back-reference comes from.                                     |
+|                                                                     |
+| Return true only if a reference was created to the contents of      |
+| TOKEN, in which case, LEVEL was non-negative and the lifetime of    |
+| TOKEN and its contents must last as long as the input engine can    |
+| parse references to it.  INUSE determines whether composite tokens  |
+| should favor creating back-references or copying text.             |
+`--------------------------------------------------------------------*/
 bool
-push_token (token_data *token, int level)
+push_token (token_data *token, int level, bool inuse)
 {
+  token_chain *src_chain = NULL;
   token_chain *chain;
-  bool result = false;
 
   assert (next);
-  /* TODO - also accept TOKEN_COMP chains.  */
-  assert (TOKEN_DATA_TYPE (token) == TOKEN_TEXT);
+  /* TODO - also accept TOKEN_COMP chains containing $@ ref.  */
 
   /* Speed consideration - for short enough tokens, the speed and
      memory overhead of parsing another INPUT_CHAIN link outweighs the
-     time to inline the token text.  */
-  if (TOKEN_DATA_LEN (token) <= INPUT_INLINE_THRESHOLD)
+     time to inline the token text.  But don't re-copy text if it
+     already lives on the obstack.  */
+  if (TOKEN_DATA_TYPE (token) == TOKEN_TEXT)
     {
-      obstack_grow (current_input, TOKEN_DATA_TEXT (token),
-                   TOKEN_DATA_LEN (token));
-      return false;
+      assert (level >= 0);
+      if (TOKEN_DATA_LEN (token) <= INPUT_INLINE_THRESHOLD)
+       {
+         obstack_grow (current_input, TOKEN_DATA_TEXT (token),
+                       TOKEN_DATA_LEN (token));
+         return false;
+       }
+    }
+  else
+    {
+      /* For composite tokens, if argv is already in use, creating
+        additional references for long text segments is more
+        efficient in time.  But if argv is not yet in use, and we
+        have a composite token, then the token must already contain a
+        back-reference, and memory usage is more efficient if we can
+        avoid using the current expand_macro, even if it means larger
+        copies.  */
+      assert (TOKEN_DATA_TYPE (token) == TOKEN_COMP);
+      src_chain = token->u.u_c.chain;
+      while (level >= 0 && src_chain && src_chain->type == CHAIN_STR
+            && (src_chain->u.u_s.len <= INPUT_INLINE_THRESHOLD
+                || (!inuse && src_chain->u.u_s.level == -1)))
+       {
+         obstack_grow (current_input, src_chain->u.u_s.str,
+                       src_chain->u.u_s.len);
+         src_chain = src_chain->next;
+       }
+      if (!src_chain)
+       return false;
     }
 
   if (next->type == INPUT_STRING)
@@ -352,24 +389,71 @@ push_token (token_data *token, int level)
       next->u.u_c.chain = next->u.u_c.end = NULL;
     }
   make_text_link (current_input, &next->u.u_c.chain, &next->u.u_c.end);
-  chain = (token_chain *) obstack_alloc (current_input, sizeof *chain);
-  if (next->u.u_c.end)
-    next->u.u_c.end->next = chain;
-  else
-    next->u.u_c.chain = chain;
-  next->u.u_c.end = chain;
-  chain->next = NULL;
-  chain->type = CHAIN_STR;
-  chain->quote_age = TOKEN_DATA_QUOTE_AGE (token);
-  chain->u.u_s.str = TOKEN_DATA_TEXT (token);
-  chain->u.u_s.len = TOKEN_DATA_LEN (token);
-  chain->u.u_s.level = level;
-  if (level >= 0)
+  if (TOKEN_DATA_TYPE (token) == TOKEN_TEXT)
     {
+      chain = (token_chain *) obstack_alloc (current_input, sizeof *chain);
+      if (next->u.u_c.end)
+       next->u.u_c.end->next = chain;
+      else
+       next->u.u_c.chain = chain;
+      next->u.u_c.end = chain;
+      chain->next = NULL;
+      chain->type = CHAIN_STR;
+      chain->quote_age = TOKEN_DATA_QUOTE_AGE (token);
+      chain->u.u_s.str = TOKEN_DATA_TEXT (token);
+      chain->u.u_s.len = TOKEN_DATA_LEN (token);
+      chain->u.u_s.level = level;
       adjust_refcount (level, true);
-      result = true;
+      inuse = true;
     }
-  return result;
+  while (src_chain)
+    {
+      if (level == -1)
+       {
+         /* Nothing to copy, since link already lives on obstack.  */
+         assert (src_chain->type != CHAIN_STR
+                 || src_chain->u.u_s.level == -1);
+         chain = src_chain;
+       }
+      else
+       {
+         /* Allow inlining the final link with subsequent text.  */
+         if (!src_chain->next && src_chain->type == CHAIN_STR
+             && (src_chain->u.u_s.len <= INPUT_INLINE_THRESHOLD
+                 || (!inuse && src_chain->u.u_s.level == -1)))
+           {
+             obstack_grow (current_input, src_chain->u.u_s.str,
+                           src_chain->u.u_s.len);
+             break;
+           }
+         /* We must clone each link in the chain, since next_char
+            destructively modifies the chain it is parsing.  */
+         chain = (token_chain *) obstack_copy (current_input, src_chain,
+                                               sizeof *chain);
+         if (chain->type == CHAIN_STR && chain->u.u_s.level == -1)
+           {
+             if (chain->u.u_s.len <= INPUT_INLINE_THRESHOLD || !inuse)
+               chain->u.u_s.str = (char *) obstack_copy (current_input,
+                                                         chain->u.u_s.str,
+                                                         chain->u.u_s.len);
+             else
+               {
+                 chain->u.u_s.level = level;
+                 inuse = true;
+               }
+           }
+       }
+      if (next->u.u_c.end)
+       next->u.u_c.end->next = chain;
+      else
+       next->u.u_c.chain = chain;
+      next->u.u_c.end = chain;
+      assert (chain->type == CHAIN_STR);
+      if (chain->u.u_s.level >= 0)
+       adjust_refcount (chain->u.u_s.level, true);
+      src_chain = src_chain->next;
+    }
+  return inuse;
 }
 
 /*-------------------------------------------------------------------.
@@ -843,7 +927,20 @@ append_quote_token (struct obstack *obs, token_data *td)
 {
   token_chain *src_chain = isp->u.u_c.chain;
   token_chain *chain;
-  assert (isp->type == INPUT_CHAIN && obs && current_quote_age);
+
+  assert (isp->type == INPUT_CHAIN && obs && current_quote_age
+         && src_chain->type == CHAIN_STR && src_chain->u.u_s.level >= 0);
+  isp->u.u_c.chain = src_chain->next;
+
+  /* Speed consideration - for short enough tokens, the speed and
+     memory overhead of parsing another INPUT_CHAIN link outweighs the
+     time to inline the token text.  */
+  if (src_chain->u.u_s.len <= INPUT_INLINE_THRESHOLD)
+    {
+      obstack_grow (obs, src_chain->u.u_s.str, src_chain->u.u_s.len);
+      adjust_refcount (src_chain->u.u_s.level, false);
+      return;
+    }
 
   if (TOKEN_DATA_TYPE (td) == TOKEN_VOID)
     {
@@ -858,8 +955,7 @@ append_quote_token (struct obstack *obs, token_data *td)
   else
     td->u.u_c.chain = chain;
   td->u.u_c.end = chain;
-  td->u.u_c.end->next = NULL;
-  isp->u.u_c.chain = src_chain->next;
+  chain->next = NULL;
 }
 
 /*------------------------------------------------------------------.
diff --git a/src/m4.h b/src/m4.h
index a541687..ca886aa 100644
--- a/src/m4.h
+++ b/src/m4.h
@@ -378,7 +378,7 @@ void make_text_link (struct obstack *, token_chain **, 
token_chain **);
 void push_file (FILE *, const char *, bool);
 void push_macro (builtin_func *);
 struct obstack *push_string_init (void);
-bool push_token (token_data *, int);
+bool push_token (token_data *, int, bool);
 const input_block *push_string_finish (void);
 void push_wrapup (const char *);
 bool pop_wrapup (void);
diff --git a/src/macro.c b/src/macro.c
index d22226e..6ec09b0 100644
--- a/src/macro.c
+++ b/src/macro.c
@@ -808,7 +808,8 @@ arg_type (macro_arguments *argv, unsigned int index)
     return TOKEN_TEXT;
   token = arg_token (argv, index);
   type = TOKEN_DATA_TYPE (token);
-  /* Composite tokens are currently sequences of text only.  */
+  /* When accessed via the arg_* interface, composite tokens are
+     currently sequences of text only.  */
   if (type == TOKEN_COMP)
     type = TOKEN_TEXT;
   return type;
@@ -1104,23 +1105,8 @@ push_arg (struct obstack *obs, macro_arguments *argv, 
unsigned int index)
     return;
   token = arg_token (argv, index);
   /* TODO handle func tokens?  */
-  if (TOKEN_DATA_TYPE (token) == TOKEN_TEXT)
-    {
-      if (push_token (token, expansion_level - 1))
-       arg_mark (argv);
-    }
-  else if (TOKEN_DATA_TYPE (token) == TOKEN_COMP)
-    {
-      /* TODO - concatenate multiple arguments?  For now, we assume
-        all elements are text.  */
-      token_chain *chain = token->u.u_c.chain;
-      while (chain)
-       {
-         assert (chain->type == CHAIN_STR);
-         obstack_grow (obs, chain->u.u_s.str, chain->u.u_s.len);
-         chain = chain->next;
-       }
-    }
+  if (push_token (token, expansion_level - 1, argv->inuse))
+    arg_mark (argv);
 }
 
 /* Push series of comma-separated arguments from ARGV, which should
@@ -1131,7 +1117,6 @@ void
 push_args (struct obstack *obs, macro_arguments *argv, bool skip, bool quote)
 {
   token_data *token;
-  token_chain *chain;
   unsigned int i = skip ? 2 : 1;
   const char *sep = ",";
   size_t sep_len = 1;
@@ -1171,20 +1156,7 @@ push_args (struct obstack *obs, macro_arguments *argv, 
bool skip, bool quote)
       else
        use_sep = true;
       /* TODO handle func tokens?  */
-      if (TOKEN_DATA_TYPE (token) == TOKEN_TEXT)
-       inuse |= push_token (token, expansion_level - 1);
-      else
-       {
-         /* TODO - handle composite text in push_token.  */
-         assert (TOKEN_DATA_TYPE (token) == TOKEN_COMP);
-         chain = token->u.u_c.chain;
-         while (chain)
-           {
-             assert (chain->type == CHAIN_STR);
-             obstack_grow (obs, chain->u.u_s.str, chain->u.u_s.len);
-             chain = chain->next;
-           }
-       }
+      inuse |= push_token (token, expansion_level - 1, inuse);
     }
   if (quote)
     obstack_grow (obs, curr_quote.str2, curr_quote.len2);
-- 
1.5.3.8


reply via email to

[Prev in Thread] Current Thread [Next in Thread]