argv_ref patch 29: huge speedup to m4 input engine

m4-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

argv_ref patch 29: huge speedup to m4 input engine

From:	Eric Blake
Subject:	argv_ref patch 29: huge speedup to m4 input engine
Date:	Tue, 17 Feb 2009 06:27:43 -0700
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.19) Gecko/20081209 Thunderbird/2.0.0.19 Mnenhy/0.7.6.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nearly a year after Bruno first profiled the m4 input engine, and
suggested some ideas on how to make it more efficient by parsing a block
rather than a character at a time, I have finally merged the patch into
the master branch.  The idea is that repeatedly calling next_char() to
grab one character at a time is a lot of overhead; grabbing a lookahead
buffer, and using string scanning operations that can look a word at a
time reduces this overhead.  It relies on the gnulib freadptr/freadseek
extensions to grab lookahead buffers of files, and the gnulib memchr2
extension to find the first of two bytes that could potentially start a
quote delimiter.

On the master branch, I broke it into two patches.  The first updates the
machinery to support the lookahead buffers, and gives about 4% speedup.
The second converts quoted strings to utilize the lookahead buffer,
providing more than 17% speedup.  The speedup on branch-1.6 is not as
impressive, since there weren't as many indirect function calls being
avoided, but still exceeds a 10% improvement.

        Stage 29: Process input by buffer, not bytes.
        Enhance input engine to provide lookahead buffer, rather than
        forcing clients to call next_char for every byte.  Utilize this
        new interface in all clients.
        Memory impact: none.
        Speed impact: noticeable improvement, from fewer function calls.
        * m4/gnulib-cache.m4: Import freadptr, freadseek, and memchr2
        modules.
        * src/input.c (next_buffer, consume_buffer): New functions.
        (skip_line, match_input, next_token): Use them to scan a buffer at
        a time.
        * NEWS: Document this.
        Suggested by Bruno Haible:
        http://lists.gnu.org/archive/html/m4-discuss/2008-02/msg00010.html
        http://lists.gnu.org/archive/html/m4-discuss/2008-02/msg00012.html

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkmau08ACgkQ84KuGfSFAYAggwCdG7mPBO3FdFQ5cTf8pgqoq/OR
YMsAnRq0wu8kqQ8WSKvBjauNmxQwYUF3
=usmo
-----END PGP SIGNATURE-----

>From 0e14ae3e78f06cefeabb61ca23ddbdf00afc2a00 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Fri, 13 Feb 2009 07:10:36 -0700
Subject: [PATCH 1/2] Stage 29a: Process dnl and macro names by buffer, not 
bytes.

* ltdl/m4/gnulib-cache.m4: Import freadptr and freadseek modules.
* m4/input.c (struct input_funcs): Add virtual functions
buffer_func and consume_func.
(file_buffer, file_consume, string_buffer, string_consume)
(composite_buffer, composite_consume, eof_buffer): Implement
them.
(file_funcs, string_funcs, composite_funcs, eof_funcs): Update
vtables accordingly.
(buffer_retry): New sentinel.
(next_buffer, consume_buffer): New functions.
(m4_skip_line, match_input, consume_syntax): Use them for faster
parsing.
Suggested by Bruno Haible.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog               |   20 +++
 ltdl/m4/gnulib-cache.m4 |    4 +-
 m4/input.c              |  317 +++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 304 insertions(+), 37 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 726fdc8..d61afa1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,25 @@
 2009-02-16  Eric Blake  <address@hidden>

+       Stage 29a: Process dnl and macro names by buffer, not bytes.
+       Enhance input engine to provide lookahead buffer, rather than
+       forcing clients to call next_char for every byte.  Utilize this
+       for the simplest clients.
+       Memory impact: none.
+       Speed impact: noticeable improvement, from fewer function calls.
+       * ltdl/m4/gnulib-cache.m4: Import freadptr and freadseek modules.
+       * m4/input.c (struct input_funcs): Add virtual functions
+       buffer_func and consume_func.
+       (file_buffer, file_consume, string_buffer, string_consume)
+       (composite_buffer, composite_consume, eof_buffer): Implement
+       them.
+       (file_funcs, string_funcs, composite_funcs, eof_funcs): Update
+       vtables accordingly.
+       (buffer_retry): New sentinel.
+       (next_buffer, consume_buffer): New functions.
+       (m4_skip_line, match_input, consume_syntax): Use them for faster
+       parsing.
+       Suggested by Bruno Haible.
+
        Unify single and multi-character delimiter handling.
        * m4/input.c (MATCH): Add a parameter.
        (m4__next_token): Simplify logic and reduce redundancy.
diff --git a/ltdl/m4/gnulib-cache.m4 b/ltdl/m4/gnulib-cache.m4
index fbd030e..1cda6d4 100644
--- a/ltdl/m4/gnulib-cache.m4
+++ b/ltdl/m4/gnulib-cache.m4
@@ -15,7 +15,7 @@


 # Specification in the form of a command-line invocation:
-#   gnulib-tool --import --dir=. --local-dir=local --lib=libgnu 
--source-base=gnu --m4-base=ltdl/m4 --doc-base=doc --tests-base=tests/gnu 
--aux-dir=build-aux --with-tests --libtool --macro-prefix=M4 assert autobuild 
avltree-oset binary-io clean-temp cloexec close-stream closein config-h 
configmake dirname error exit fdl-1.3 fflush filenamecat flexmember fopen 
fopen-safer fseeko gendocs gettext git-version-gen gnumakefile gnupload gpl-3.0 
intprops memmem mkstemp obstack obstack-printf-posix progname propername quote 
regex regexprops-generic sprintf-posix stdbool stdlib-safer strnlen strtod 
strtol tempname unlocked-io vasnprintf-posix verify verror xalloc xalloc-die 
xmemdup0 xprintf-posix xstrndup xvasprintf-posix
+#   gnulib-tool --import --dir=. --local-dir=local --lib=libgnu 
--source-base=gnu --m4-base=ltdl/m4 --doc-base=doc --tests-base=tests/gnu 
--aux-dir=build-aux --with-tests --libtool --macro-prefix=M4 assert autobuild 
avltree-oset binary-io clean-temp cloexec close-stream closein config-h 
configmake dirname error exit fdl-1.3 fflush filenamecat flexmember fopen 
fopen-safer freadptr freadseek fseeko gendocs gettext git-version-gen 
gnumakefile gnupload gpl-3.0 intprops memmem mkstemp obstack 
obstack-printf-posix progname propername quote regex regexprops-generic 
sprintf-posix stdbool stdlib-safer strnlen strtod strtol tempname unlocked-io 
vasnprintf-posix verify verror xalloc xalloc-die xmemdup0 xprintf-posix 
xstrndup xvasprintf-posix

 # Specification in the form of a few gnulib-tool.m4 macro invocations:
 gl_LOCAL_DIR([local])
@@ -39,6 +39,8 @@ gl_MODULES([
   flexmember
   fopen
   fopen-safer
+  freadptr
+  freadseek
   fseeko
   gendocs
   gettext
diff --git a/m4/input.c b/m4/input.c
index dd3addc..36a1481 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -24,6 +24,9 @@

 #include "m4private.h"

+#include "freadptr.h"
+#include "freadseek.h"
+
 /* Define this to see runtime debug info.  Implied by DEBUG.  */
 /*#define DEBUG_INPUT */

@@ -43,9 +46,11 @@

    Each input_block has an associated struct input_funcs, which is a
    vtable that defines polymorphic functions for peeking, reading,
-   unget, cleanup, and printing in trace output.  All input is done
-   through the function pointers of the input_funcs on the given
-   input_block, and all characters are unsigned, to distinguish
+   unget, cleanup, and printing in trace output.  Getting a single
+   character at a time is inefficient, so there are also functions for
+   accessing the readahead buffer and consuming bulk input.  All input
+   is done through the function pointers of the input_funcs on the
+   given input_block, and all characters are unsigned, to distinguish
    between stdio EOF and between special sentinel characters.  When a
    input_block is exhausted, its reader returns CHAR_RETRY which
    causes the input_block to be popped from the input_stack.
@@ -94,30 +99,41 @@

 typedef struct m4_input_block m4_input_block;

-static int     file_peek               (m4_input_block *, m4 *, bool);
-static int     file_read               (m4_input_block *, m4 *, bool, bool,
+static int             file_peek       (m4_input_block *, m4 *, bool);
+static int             file_read       (m4_input_block *, m4 *, bool, bool,
                                         bool);
-static void    file_unget              (m4_input_block *, int);
-static bool    file_clean              (m4_input_block *, m4 *, bool);
-static void    file_print              (m4_input_block *, m4 *, m4_obstack *,
+static void            file_unget      (m4_input_block *, int);
+static bool            file_clean      (m4_input_block *, m4 *, bool);
+static void            file_print      (m4_input_block *, m4 *, m4_obstack *,
                                         int);
-static int     string_peek             (m4_input_block *, m4 *, bool);
-static int     string_read             (m4_input_block *, m4 *, bool, bool,
+static const char *    file_buffer     (m4_input_block *, m4 *, size_t *,
+                                        bool);
+static void            file_consume    (m4_input_block *, m4 *, size_t);
+static int             string_peek     (m4_input_block *, m4 *, bool);
+static int             string_read     (m4_input_block *, m4 *, bool, bool,
                                         bool);
-static void    string_unget            (m4_input_block *, int);
-static void    string_print            (m4_input_block *, m4 *, m4_obstack *,
+static void            string_unget    (m4_input_block *, int);
+static void            string_print    (m4_input_block *, m4 *, m4_obstack *,
                                         int);
-static int     composite_peek          (m4_input_block *, m4 *, bool);
-static int     composite_read          (m4_input_block *, m4 *, bool, bool,
+static const char *    string_buffer   (m4_input_block *, m4 *, size_t *,
                                         bool);
-static void    composite_unget         (m4_input_block *, int);
-static bool    composite_clean         (m4_input_block *, m4 *, bool);
-static void    composite_print         (m4_input_block *, m4 *, m4_obstack *,
+static void            string_consume  (m4_input_block *, m4 *, size_t);
+static int             composite_peek  (m4_input_block *, m4 *, bool);
+static int             composite_read  (m4_input_block *, m4 *, bool, bool,
+                                        bool);
+static void            composite_unget (m4_input_block *, int);
+static bool            composite_clean (m4_input_block *, m4 *, bool);
+static void            composite_print (m4_input_block *, m4 *, m4_obstack *,
                                         int);
-static int     eof_peek                (m4_input_block *, m4 *, bool);
-static int     eof_read                (m4_input_block *, m4 *, bool, bool,
+static const char *    composite_buffer (m4_input_block *, m4 *, size_t *,
+                                         bool);
+static void            composite_consume (m4_input_block *, m4 *, size_t);
+static int             eof_peek        (m4_input_block *, m4 *, bool);
+static int             eof_read        (m4_input_block *, m4 *, bool, bool,
+                                        bool);
+static void            eof_unget       (m4_input_block *, int);
+static const char *    eof_buffer      (m4_input_block *, m4 *, size_t *,
                                         bool);
-static void    eof_unget               (m4_input_block *, int);

 static void    init_builtin_token      (m4 *, m4_obstack *,
                                         m4_symbol_value *);
@@ -128,6 +144,8 @@ static      int     next_char               (m4 *, bool, 
bool, bool);
 static int     peek_char               (m4 *, bool);
 static bool    pop_input               (m4 *, bool);
 static void    unget_input             (int);
+static const char * next_buffer        (m4 *, size_t *, bool);
+static void    consume_buffer          (m4 *, size_t);
 static bool    consume_syntax          (m4 *, m4_obstack *, unsigned int);

 #ifdef DEBUG_INPUT
@@ -165,6 +183,20 @@ struct input_funcs
   /* Add a representation of the input block to the obstack, for use
      in trace expansion output.  */
   void (*print_func)   (m4_input_block *, m4 *, m4_obstack *, int);
+
+  /* Return a pointer to the current readahead buffer, and set LEN to
+     the length of the result.  If ALLOW_QUOTE, do not return a buffer
+     for a quoted string.  If there is data, but the result of
+     next_char() would not fit in a char (for example, CHAR_EOF or
+     CHAR_QUOTE) or there is no readahead data available, return NULL,
+     and the caller must use next_char().  If there is no more data,
+     return buffer_retry.  The buffer is only valid until the next
+     consume_buffer() or next_char().  */
+  const char *(*buffer_func) (m4_input_block *, m4 *, size_t *, bool);
+
+  /* Optional function to consume data from a readahead buffer
+     previously obtained through buffer_func.  */
+  void (*consume_func) (m4_input_block *, m4 *, size_t);
 };

 /* A block of input to be scanned.  */
@@ -235,28 +267,33 @@ static bool input_change;

 /* Vtable for handling input from files.  */
 static struct input_funcs file_funcs = {
-  file_peek, file_read, file_unget, file_clean, file_print
+  file_peek, file_read, file_unget, file_clean, file_print, file_buffer,
+  file_consume
 };

 /* Vtable for handling input from strings.  */
 static struct input_funcs string_funcs = {
-  string_peek, string_read, string_unget, NULL, string_print
+  string_peek, string_read, string_unget, NULL, string_print, string_buffer,
+  string_consume
 };

 /* Vtable for handling input from composite chains.  */
 static struct input_funcs composite_funcs = {
   composite_peek, composite_read, composite_unget, composite_clean,
-  composite_print
+  composite_print, composite_buffer, composite_consume
 };

 /* Vtable for recognizing end of input.  */
 static struct input_funcs eof_funcs = {
-  eof_peek, eof_read, eof_unget, NULL, NULL
+  eof_peek, eof_read, eof_unget, NULL, NULL, eof_buffer, NULL
 };

 /* Marker at end of an input stack.  */
 static m4_input_block input_eof = { NULL, &eof_funcs, "", 0 };

+/* Marker for buffer_func when current block has no more data.  */
+static const char buffer_retry[1];
+
 
 /* Input files, from command line or [s]include.  */
 static int
@@ -354,6 +391,42 @@ file_print (m4_input_block *me, m4 *context 
M4_GNUC_UNUSED, m4_obstack *obs,
   obstack_1grow (obs, '>');
 }

+static const char *
+file_buffer (m4_input_block *me, m4 *context M4_GNUC_UNUSED, size_t *len,
+            bool allow_quote M4_GNUC_UNUSED)
+{
+  if (start_of_input_line)
+    {
+      start_of_input_line = false;
+      m4_set_current_line (context, ++me->line);
+    }
+  if (me->u.u_f.end)
+    return buffer_retry;
+  return freadptr (isp->u.u_f.fp, len);
+}
+
+static void
+file_consume (m4_input_block *me, m4 *context, size_t len)
+{
+  const char *buf;
+  const char *p;
+  size_t buf_len;
+  assert (!start_of_input_line);
+  buf = freadptr (me->u.u_f.fp, &buf_len);
+  assert (buf && len <= buf_len);
+  buf_len = 0;
+  while ((p = memchr (buf + buf_len, '\n', len - buf_len)))
+    {
+      if (p == buf + len - 1)
+       start_of_input_line = true;
+      else
+       m4_set_current_line (context, ++me->line);
+      buf_len = p - buf + 1;
+    }
+  if (freadseek (isp->u.u_f.fp, len) != 0)
+    assert (false);
+}
+
 /* m4_push_file () pushes an input file FP with name TITLE on the
   input stack, saving the current file name and line number.  If next
   is non-NULL, this push invalidates a call to m4_push_string_init (),
@@ -439,6 +512,24 @@ string_print (m4_input_block *me, m4 *context, m4_obstack 
*obs,
                           &arg_length);
 }

+static const char *
+string_buffer (m4_input_block *me, m4 *context M4_GNUC_UNUSED, size_t *len,
+              bool allow_quote M4_GNUC_UNUSED)
+{
+  if (!me->u.u_s.len)
+    return buffer_retry;
+  *len = me->u.u_s.len;
+  return me->u.u_s.str;
+}
+
+static void
+string_consume (m4_input_block *me, m4 *context M4_GNUC_UNUSED, size_t len)
+{
+  assert (len <= me->u.u_s.len);
+  me->u.u_s.len -= len;
+  me->u.u_s.str += len;
+}
+
 /* First half of m4_push_string ().  The pointer next points to the
    new input_block.  FILE and LINE describe the location where the
    macro starts that is generating the expansion (even if the location
@@ -904,6 +995,63 @@ composite_print (m4_input_block *me, m4 *context, 
m4_obstack *obs,
     m4_shipout_string (context, obs, quotes->str2, quotes->len2, false);
 }

+static const char *
+composite_buffer (m4_input_block *me, m4 *context, size_t *len,
+                 bool allow_quote)
+{
+  m4__symbol_chain *chain = me->u.u_c.chain;
+  while (chain)
+    {
+      if (allow_quote && chain->quote_age == m4__quote_age (M4SYNTAX))
+       return NULL; /* CHAR_QUOTE doesn't fit in buffer.  */
+      switch (chain->type)
+       {
+       case M4__CHAIN_STR:
+         if (chain->u.u_s.len)
+           {
+             *len = chain->u.u_s.len;
+             return chain->u.u_s.str;
+           }
+         if (chain->u.u_s.level < SIZE_MAX)
+           m4__adjust_refcount (context, chain->u.u_s.level, false);
+         break;
+       case M4__CHAIN_FUNC:
+         if (chain->u.builtin)
+           return NULL; /* CHAR_BUILTIN doesn't fit in buffer.  */
+         break;
+       case M4__CHAIN_ARGV:
+         if (chain->u.u_a.index == m4_arg_argc (chain->u.u_a.argv))
+           {
+             m4__arg_adjust_refcount (context, chain->u.u_a.argv, false);
+             break;
+           }
+         return NULL; /* No buffer to provide.  */
+       case M4__CHAIN_LOC:
+         me->file = chain->u.u_l.file;
+         me->line = chain->u.u_l.line;
+         input_change = true;
+         me->u.u_c.chain = chain->next;
+         return next_buffer (context, len, allow_quote);
+       default:
+         assert (!"composite_buffer");
+         abort ();
+       }
+      me->u.u_c.chain = chain = chain->next;
+    }
+  return buffer_retry;
+}
+
+static void
+composite_consume (m4_input_block *me, m4 *context M4_GNUC_UNUSED, size_t len)
+{
+  m4__symbol_chain *chain = me->u.u_c.chain;
+  assert (chain && chain->type == M4__CHAIN_STR && len <= chain->u.u_s.len);
+  /* Partial consumption invalidates quote age.  */
+  chain->quote_age = 0;
+  chain->u.u_s.len -= len;
+  chain->u.u_s.str += len;
+}
+
 /* Given an obstack OBS, capture any unfinished text as a link in the
    chain that starts at *START and ends at *END.  START may be NULL if
    *END is non-NULL.  */
@@ -1001,6 +1149,13 @@ eof_unget (m4_input_block *me M4_GNUC_UNUSED, int ch)
   assert (ch == CHAR_EOF);
 }

+static const char *
+eof_buffer (m4_input_block *me M4_GNUC_UNUSED, m4 *context M4_GNUC_UNUSED,
+           size_t *len M4_GNUC_UNUSED, bool allow_unget M4_GNUC_UNUSED)
+{
+  return NULL;
+}
+
 
 /* When tracing, print a summary of the contents of the input block
    created by push_string_init/push_string_finish to OBS.  Use
@@ -1340,6 +1495,50 @@ unget_input (int ch)
   isp->funcs->unget_func (isp, ch);
 }

+/* Return a pointer to the available bytes of the current input block,
+   and set *LEN to the length of the result.  If ALLOW_QUOTE, do not
+   return a buffer for a quoted string.  If the result does not fit in
+   a char (for example, CHAR_EOF or CHAR_QUOTE), or if there is no
+   readahead data available, return NULL, and the caller must fall
+   back to next_char().  The buffer is only valid until the next
+   consume_buffer() or next_char().  */
+static const char *
+next_buffer (m4 *context, size_t *len, bool allow_quote)
+{
+  const char *buf;
+  while (1)
+    {
+      assert (isp);
+      if (input_change)
+       {
+         m4_set_current_file (context, isp->file);
+         m4_set_current_line (context, isp->line);
+         input_change = false;
+       }
+
+      assert (isp->funcs->buffer_func);
+      buf = isp->funcs->buffer_func (isp, context, len, allow_quote);
+      if (buf != buffer_retry)
+       return buf;
+      /* End of input source --- pop one level.  */
+      pop_input (context, true);
+    }
+}
+
+/* Consume LEN bytes from the current input block, as though by LEN
+   calls to next_char().  LEN must be less than or equal to the
+   previous length returned by a successful call to next_buffer().  */
+static void
+consume_buffer (m4 *context, size_t len)
+{
+  assert (isp && !input_change);
+  if (len)
+    {
+      assert (isp->funcs->consume_func);
+      isp->funcs->consume_func (isp, context, len);
+    }
+}
+
 /* skip_line () simply discards all immediately following characters,
    up to the first newline.  It is only used from m4_dnl ().  Report
    errors on behalf of CALLER.  */
@@ -1348,9 +1547,28 @@ m4_skip_line (m4 *context, const m4_call_info *caller)
 {
   int ch;

-  while ((ch = next_char (context, false, false, false)) != CHAR_EOF
-        && ch != '\n')
-    ;
+  while (1)
+    {
+      size_t len;
+      const char *buffer = next_buffer (context, &len, false);
+      if (buffer)
+       {
+         const char *p = (char *) memchr (buffer, '\n', len);
+         if (p)
+           {
+             consume_buffer (context, p - buffer + 1);
+             ch = '\n';
+             break;
+           }
+         consume_buffer (context, len);
+       }
+      else
+       {
+         ch = next_char (context, false, false, false);
+         if (ch == CHAR_EOF || ch == '\n')
+           break;
+       }
+    }
   if (ch == CHAR_EOF)
     m4_warn (context, 0, caller, _("end of file treated as newline"));
 }
@@ -1377,16 +1595,26 @@ match_input (m4 *context, const char *s, size_t len, 
bool consume)
   const char *t;
   m4_obstack *st;
   bool result = false;
+  size_t buf_len;

   if (consume)
     {
       s++;
       len--;
     }
+  /* Try a buffer match first.  */
   assert (len);
+  t = next_buffer (context, &buf_len, false);
+  if (t && len <= buf_len && memcmp (s, t, len) == 0)
+    {
+      if (consume)
+       consume_buffer (context, len);
+      return true;
+    }
+  /* Fall back on byte matching.  */
   ch = peek_char (context, false);
   if (ch != to_uchar (*s))
-    return false;                      /* fail */
+    return false;

   if (len == 1)
     {
@@ -1445,20 +1673,37 @@ consume_syntax (m4 *context, m4_obstack *obs, unsigned 
int syntax)
   assert (syntax);
   while (1)
     {
-      /* It is safe to call next_char without first checking
-        peek_char, except at input source boundaries, which we detect
-        by CHAR_RETRY.  We exploit the fact that CHAR_EOF,
-        CHAR_BUILTIN, CHAR_QUOTE, and CHAR_ARGV do not satisfy any
-        syntax categories.  */
-      while ((ch = next_char (context, allow, allow, true)) != CHAR_RETRY
-            && m4_has_syntax (M4SYNTAX, ch, syntax))
+      /* Start with a buffer search.  */
+      size_t len;
+      const char *buffer = next_buffer (context, &len, allow);
+      if (buffer)
+       {
+         const char *p = buffer;
+         while (len && m4_has_syntax (M4SYNTAX, *p, syntax))
+           {
+             len--;
+             p++;
+           }
+         obstack_grow (obs, buffer, p - buffer);
+         consume_buffer (context, p - buffer);
+         if (len)
+           return false;
+       }
+      /* Fall back to byte-wise search.  It is safe to call next_char
+        without first checking peek_char, except at input source
+        boundaries, which we detect by CHAR_RETRY.  */
+      ch = next_char (context, allow, allow, true);
+      if (ch < CHAR_EOF && m4_has_syntax (M4SYNTAX, ch, syntax))
        {
-         assert (ch < CHAR_EOF);
          obstack_1grow (obs, ch);
+         continue;
        }
       if (ch == CHAR_RETRY || ch == CHAR_QUOTE || ch == CHAR_ARGV)
        {
          ch = peek_char (context, false);
+         /* We exploit the fact that CHAR_EOF, CHAR_BUILTIN,
+            CHAR_QUOTE, and CHAR_ARGV do not satisfy any syntax
+            categories.  */
          if (m4_has_syntax (M4SYNTAX, ch, syntax))
            {
              assert (ch < CHAR_EOF);
-- 
1.6.1.2


>From 047d480cdc9ff71e4e3228017ca24a83737cbf1f Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Mon, 16 Feb 2009 08:52:48 -0700
Subject: [PATCH 2/2] Stage 29b: Process quotes and comments by buffer, not 
bytes.

* ltdl/m4/gnulib-cache.m4: Import memchr2 module.
* m4/input.c (m4__next_token): Add buffer reads to quote and
comment parsing.
* NEWS: Document this.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog               |   11 +++++
 NEWS                    |   13 ++++--
 ltdl/m4/gnulib-cache.m4 |    3 +-
 m4/input.c              |  101 +++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 119 insertions(+), 9 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index d61afa1..ad5e8a4 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2009-02-17  Eric Blake  <address@hidden>
+
+       Stage 29b: Process quotes and comments by buffer, not bytes.
+       Search for quote and comment delimiters by buffer when possible.
+       Memory impact: none.
+       Speed impact: noticeable improvement, from fewer function calls.
+       * ltdl/m4/gnulib-cache.m4: Import memchr2 module.
+       * m4/input.c (m4__next_token): Add buffer reads to quote and
+       comment parsing.
+       * NEWS: Document this.
+
 2009-02-16  Eric Blake  <address@hidden>

        Stage 29a: Process dnl and macro names by buffer, not bytes.
diff --git a/NEWS b/NEWS
index 772216d..1f25484 100644
--- a/NEWS
+++ b/NEWS
@@ -42,11 +42,6 @@ promoted to 2.0.
 *** The `-L'/`--nesting-limit' command-line option now performs argument
     validation and accepts an optional multiplier suffix.

-*** The `-o'/`--error-output' command-line options, which were replaced by
-    `--debugfile' in M4 1.4.7, now issue a deprecation warning.  This
-    warning interferes with all versions of Autoconf prior to 2.61, so plan
-    on installing an updated Autoconf when installing M4 2.0.
-
 *** New `-p'/`--pushdef' and `--popdef' command-line options allow more
     control over macro definitions from the command line between input
     files.
@@ -217,6 +212,14 @@ promoted to 2.0.
 ** Remove the undocumented command-line option '-N', as no one complained
    about the assertion failure regression that it introduced in 1.4.7.

+** The `-o'/`--error-output' command-line options, which were replaced by
+   `--debugfile' in 1.4.7, now issue a deprecation warning.  This warning
+   harmlessly triggers with versions of Autoconf 2.60 and earlier, but can
+   be silenced by applying this patch:
+     http://git.sv.gnu.org/gitweb/?p=autoconf.git;a=commitdiff;h=714eeee87
+
+** Improve the speed of the input engine.
+
 ** Fix the `m4wrap' builtin to accumulate wrapped text in FIFO order, as
    required by POSIX.  The manual mentions a way to restore the LIFO order
    present in earlier GNU M4 versions.  NOTE: this change exposes a bug
diff --git a/ltdl/m4/gnulib-cache.m4 b/ltdl/m4/gnulib-cache.m4
index 1cda6d4..f8436dc 100644
--- a/ltdl/m4/gnulib-cache.m4
+++ b/ltdl/m4/gnulib-cache.m4
@@ -15,7 +15,7 @@


 # Specification in the form of a command-line invocation:
-#   gnulib-tool --import --dir=. --local-dir=local --lib=libgnu 
--source-base=gnu --m4-base=ltdl/m4 --doc-base=doc --tests-base=tests/gnu 
--aux-dir=build-aux --with-tests --libtool --macro-prefix=M4 assert autobuild 
avltree-oset binary-io clean-temp cloexec close-stream closein config-h 
configmake dirname error exit fdl-1.3 fflush filenamecat flexmember fopen 
fopen-safer freadptr freadseek fseeko gendocs gettext git-version-gen 
gnumakefile gnupload gpl-3.0 intprops memmem mkstemp obstack 
obstack-printf-posix progname propername quote regex regexprops-generic 
sprintf-posix stdbool stdlib-safer strnlen strtod strtol tempname unlocked-io 
vasnprintf-posix verify verror xalloc xalloc-die xmemdup0 xprintf-posix 
xstrndup xvasprintf-posix
+#   gnulib-tool --import --dir=. --local-dir=local --lib=libgnu 
--source-base=gnu --m4-base=ltdl/m4 --doc-base=doc --tests-base=tests/gnu 
--aux-dir=build-aux --with-tests --libtool --macro-prefix=M4 assert autobuild 
avltree-oset binary-io clean-temp cloexec close-stream closein config-h 
configmake dirname error exit fdl-1.3 fflush filenamecat flexmember fopen 
fopen-safer freadptr freadseek fseeko gendocs gettext git-version-gen 
gnumakefile gnupload gpl-3.0 intprops memchr2 memmem mkstemp obstack 
obstack-printf-posix progname propername quote regex regexprops-generic 
sprintf-posix stdbool stdlib-safer strnlen strtod strtol tempname unlocked-io 
vasnprintf-posix verify verror xalloc xalloc-die xmemdup0 xprintf-posix 
xstrndup xvasprintf-posix

 # Specification in the form of a few gnulib-tool.m4 macro invocations:
 gl_LOCAL_DIR([local])
@@ -49,6 +49,7 @@ gl_MODULES([
   gnupload
   gpl-3.0
   intprops
+  memchr2
   memmem
   mkstemp
   obstack
diff --git a/m4/input.c b/m4/input.c
index 36a1481..0fb4101 100644
--- a/m4/input.c
+++ b/m4/input.c
@@ -26,6 +26,7 @@

 #include "freadptr.h"
 #include "freadseek.h"
+#include "memchr2.h"

 /* Define this to see runtime debug info.  Implied by DEBUG.  */
 /*#define DEBUG_INPUT */
@@ -1857,8 +1858,64 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
        type = M4_TOKEN_STRING;
        while (1)
          {
-           ch = next_char (context, obs && m4__quote_age (M4SYNTAX), false,
-                           false);
+           /* Start with buffer search for either potential delimiter.  */
+           size_t len;
+           const char *buffer = next_buffer (context, &len,
+                                             obs && m4__quote_age (M4SYNTAX));
+           if (buffer)
+             {
+               const char *p = buffer;
+               if (m4_is_syntax_single_quotes (M4SYNTAX))
+                 do
+                   {
+                     p = (char *) memchr2 (p, *context->syntax->quote.str1,
+                                           *context->syntax->quote.str2,
+                                           buffer + len - p);
+                   }
+                 while (p && m4__quote_age (M4SYNTAX)
+                        && (*p++ == *context->syntax->quote.str2
+                            ? --quote_level : ++quote_level));
+               else
+                 {
+                   size_t remaining = len;
+                   assert (context->syntax->quote.len1 == 1
+                           && context->syntax->quote.len2 == 1);
+                   while (remaining && !m4_has_syntax (M4SYNTAX, *p,
+                                                       (M4_SYNTAX_LQUOTE
+                                                        | M4_SYNTAX_RQUOTE)))
+                     {
+                       p++;
+                       remaining--;
+                     }
+                   if (!remaining)
+                     p = NULL;
+                 }
+               if (p)
+                 {
+                   if (m4__quote_age (M4SYNTAX))
+                     {
+                       assert (!quote_level
+                               && context->syntax->quote.len1 == 1
+                               && context->syntax->quote.len2 == 1);
+                       obstack_grow (obs_safe, buffer, p - buffer - 1);
+                       consume_buffer (context, p - buffer);
+                       break;
+                     }
+                   obstack_grow (obs_safe, buffer, p - buffer);
+                   ch = to_uchar (*p);
+                   consume_buffer (context, p - buffer + 1);
+                 }
+               else
+                 {
+                   obstack_grow (obs_safe, buffer, len);
+                   consume_buffer (context, len);
+                   continue;
+                 }
+             }
+           /* Fall back to byte-wise search.  */
+           else
+             ch = next_char (context, obs && m4__quote_age (M4SYNTAX), false,
+                             false);
            if (ch == CHAR_EOF)
              {
                if (!caller)
@@ -1914,7 +1971,45 @@ m4__next_token (m4 *context, m4_symbol_value *token, int 
*line,
          obstack_1grow (obs_safe, ch);
        while (1)
          {
-           ch = next_char (context, false, false, false);
+           /* Start with buffer search for potential end delimiter.  */
+           size_t len;
+           const char *buffer = next_buffer (context, &len, false);
+           if (buffer)
+             {
+               const char *p;
+               if (m4_is_syntax_single_comments (M4SYNTAX))
+                 p = (char *) memchr (buffer, *context->syntax->comm.str2,
+                                      len);
+               else
+                 {
+                   size_t remaining = len;
+                   assert (context->syntax->comm.len2 == 1);
+                   p = buffer;
+                   while (remaining
+                          && !m4_has_syntax (M4SYNTAX, *p, M4_SYNTAX_ECOMM))
+                     {
+                       p++;
+                       remaining--;
+                     }
+                   if (!remaining)
+                     p = NULL;
+                 }
+               if (p)
+                 {
+                   obstack_grow (obs_safe, buffer, p - buffer);
+                   ch = to_uchar (*p);
+                   consume_buffer (context, p - buffer + 1);
+                 }
+               else
+                 {
+                   obstack_grow (obs_safe, buffer, len);
+                   consume_buffer (context, len);
+                   continue;
+                 }
+             }
+           /* Fall back to byte-wise search.  */
+           else
+             ch = next_char (context, false, false, false);
            if (ch == CHAR_EOF)
              {
                if (!caller)
-- 
1.6.1.2

>From eeddccf0d89edca640eeb86a879332019048ad08 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Fri, 29 Feb 2008 14:39:35 -0700
Subject: [PATCH] Stage 29: Process input by buffer, not bytes.

* m4/gnulib-cache.m4: Import freadptr, freadseek, and memchr2
modules.
* src/input.c (next_buffer, consume_buffer): New functions.
(skip_line, match_input, next_token): Use them to scan a buffer at
a time.
* NEWS: Document this.
Suggested by Bruno Haible:
http://lists.gnu.org/archive/html/m4-discuss/2008-02/msg00010.html
http://lists.gnu.org/archive/html/m4-discuss/2008-02/msg00012.html

Signed-off-by: Eric Blake <address@hidden>
(cherry picked from commit 69f894d261851504f9f8dc11f71e7da153bb0ebd)
---
 ChangeLog          |   16 +++
 NEWS               |    2 +
 m4/gnulib-cache.m4 |    5 +-
 src/input.c        |  301 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 312 insertions(+), 12 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 84fce2d..88a3723 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,21 @@
 2009-02-16  Eric Blake  <address@hidden>

+       Stage 29: Process input by buffer, not bytes.
+       Enhance input engine to provide lookahead buffer, rather than
+       forcing clients to call next_char for every byte.  Utilize this
+       new interface in all clients.
+       Memory impact: none.
+       Speed impact: noticeable improvement, from fewer function calls.
+       * m4/gnulib-cache.m4: Import freadptr, freadseek, and memchr2
+       modules.
+       * src/input.c (next_buffer, consume_buffer): New functions.
+       (skip_line, match_input, next_token): Use them to scan a buffer at
+       a time.
+       * NEWS: Document this.
+       Suggested by Bruno Haible:
+       http://lists.gnu.org/archive/html/m4-discuss/2008-02/msg00010.html
+       http://lists.gnu.org/archive/html/m4-discuss/2008-02/msg00012.html
+
        Avoid test failure due to different errno.
        * doc/m4.texinfo (Using frozen files): Ignore stderr, since
        hardened systems can prevent attempts to read /.
diff --git a/NEWS b/NEWS
index bfcb684..69c0bb8 100644
--- a/NEWS
+++ b/NEWS
@@ -28,6 +28,8 @@ Software Foundation, Inc.
    be silenced by applying this patch:
      http://git.sv.gnu.org/gitweb/?p=autoconf.git;a=commitdiff;h=714eeee87

+** Improve the speed of the input engine.
+
 ** Fix the `m4wrap' builtin to accumulate wrapped text in FIFO order, as
    required by POSIX.  The manual mentions a way to restore the LIFO order
    present in earlier GNU M4 versions.  NOTE: this change exposes a bug
diff --git a/m4/gnulib-cache.m4 b/m4/gnulib-cache.m4
index 49a778d..e235e5c 100644
--- a/m4/gnulib-cache.m4
+++ b/m4/gnulib-cache.m4
@@ -15,7 +15,7 @@


 # Specification in the form of a command-line invocation:
-#   gnulib-tool --import --dir=. --local-dir=local --lib=libm4 
--source-base=lib --m4-base=m4 --doc-base=doc --tests-base=tests 
--aux-dir=build-aux --with-tests --no-libtool --macro-prefix=M4 announce-gen 
assert autobuild avltree-oset binary-io clean-temp cloexec close-stream closein 
config-h dirname error fdl-1.3 fflush filenamecat flexmember fopen fopen-safer 
fseeko gendocs getopt git-version-gen gnumakefile gnupload gpl-3.0 hash 
intprops memmem mkstemp obstack obstack-printf-posix progname quote regex 
stdbool stdint stdlib-safer strtod strtol unlocked-io vasnprintf-posix verror 
version-etc version-etc-fsf xalloc xmemdup0 xprintf xvasprintf-posix
+#   gnulib-tool --import --dir=. --local-dir=local --lib=libm4 
--source-base=lib --m4-base=m4 --doc-base=doc --tests-base=tests 
--aux-dir=build-aux --with-tests --no-libtool --macro-prefix=M4 announce-gen 
assert autobuild avltree-oset binary-io clean-temp cloexec close-stream closein 
config-h dirname error fdl-1.3 fflush filenamecat flexmember fopen fopen-safer 
freadptr freadseek fseeko gendocs getopt git-version-gen gnumakefile gnupload 
gpl-3.0 hash intprops memchr2 memmem mkstemp obstack obstack-printf-posix 
progname quote regex stdbool stdint stdlib-safer strtod strtol unlocked-io 
vasnprintf-posix verror version-etc version-etc-fsf xalloc xmemdup0 xprintf 
xvasprintf-posix

 # Specification in the form of a few gnulib-tool.m4 macro invocations:
 gl_LOCAL_DIR([local])
@@ -38,6 +38,8 @@ gl_MODULES([
   flexmember
   fopen
   fopen-safer
+  freadptr
+  freadseek
   fseeko
   gendocs
   getopt
@@ -47,6 +49,7 @@ gl_MODULES([
   gpl-3.0
   hash
   intprops
+  memchr2
   memmem
   mkstemp
   obstack
diff --git a/src/input.c b/src/input.c
index 822f55a..2acbd70 100644
--- a/src/input.c
+++ b/src/input.c
@@ -1,7 +1,7 @@
 /* GNU m4 -- A simple macro processor

-   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2004, 2005, 2006, 2007,
-   2008 Free Software Foundation, Inc.
+   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2004, 2005, 2006,
+   2007, 2008, 2009 Free Software Foundation, Inc.

    This file is part of GNU M4.

@@ -23,6 +23,10 @@

 #include "m4.h"

+#include "freadptr.h"
+#include "freadseek.h"
+#include "memchr2.h"
+
 /* Unread input can be either files to be read (command line,
    "include", "sinclude"), strings which should be rescanned (macro
    expansion text), or quoted macro definitions (as returned by the
@@ -794,6 +798,165 @@ input_print (struct obstack *obs)
 }
 

+/*-------------------------------------------------------------------.
+| Return a pointer to the available bytes of the current input       |
+| block, and set *LEN to the length of the result.  If ALLOW_QUOTE,  |
+| do not return a buffer for a quoted string.  If the result of      |
+| next_char() would not fit in an unsigned char (for example,        |
+| CHAR_EOF or CHAR_QUOTE), or if the input block does not have an    |
+| available buffer at the moment (for example, when hitting a buffer |
+| block boundary of a file), return NULL, and the caller must fall   |
+| back on using next_char().  The buffer is only valid until the     |
+| next consume_buffer() or next_char().  When searching for a        |
+| particular byte, it is more efficient to search a buffer at a time |
+| than it is to repeatedly call next_char.                           |
+`-------------------------------------------------------------------*/
+
+static const char *
+next_buffer (size_t *len, bool allow_quote)
+{
+  token_chain *chain;
+
+  while (1)
+    {
+      assert (isp);
+      if (input_change)
+       {
+         current_file = isp->file;
+         current_line = isp->line;
+         input_change = false;
+       }
+
+      switch (isp->type)
+       {
+       case INPUT_STRING:
+         if (isp->u.u_s.len)
+           {
+             *len = isp->u.u_s.len;
+             return isp->u.u_s.str;
+           }
+         break;
+
+       case INPUT_FILE:
+         if (start_of_input_line)
+           {
+             start_of_input_line = false;
+             current_line = ++isp->line;
+           }
+         if (isp->u.u_f.end)
+           break;
+         return freadptr (isp->u.u_f.fp, len);
+
+       case INPUT_CHAIN:
+         chain = isp->u.u_c.chain;
+         while (chain)
+           {
+             if (allow_quote && chain->quote_age == current_quote_age)
+               return NULL; /* CHAR_QUOTE doesn't fit in buffer.  */
+             switch (chain->type)
+               {
+               case CHAIN_STR:
+                 if (chain->u.u_s.len)
+                   {
+                     *len = chain->u.u_s.len;
+                     return chain->u.u_s.str;
+                   }
+                 if (chain->u.u_s.level >= 0)
+                   adjust_refcount (chain->u.u_s.level, false);
+                 break;
+               case CHAIN_FUNC:
+                 if (chain->u.func)
+                   return NULL; /* CHAR_MACRO doesn't fit in buffer.  */
+                 break;
+               case CHAIN_ARGV:
+                 if (chain->u.u_a.index == arg_argc (chain->u.u_a.argv))
+                   {
+                     arg_adjust_refcount (chain->u.u_a.argv, false);
+                     break;
+                   }
+                 return NULL; /* No buffer to provide.  */
+               case CHAIN_LOC:
+                 isp->file = chain->u.u_l.file;
+                 isp->line = chain->u.u_l.line;
+                 input_change = true;
+                 isp->u.u_c.chain = chain->next;
+                 return next_buffer (len, allow_quote);
+               default:
+                 assert (!"next_buffer");
+                 abort ();
+               }
+             isp->u.u_c.chain = chain = chain->next;
+           }
+         break;
+
+       case INPUT_EOF:
+         return NULL; /* CHAR_EOF doesn't fit in buffer.  */
+
+       default:
+         assert (!"next_buffer");
+         abort ();
+       }
+
+      /* End of input source --- pop one level.  */
+      pop_input (true);
+    }
+}
+
+/*-----------------------------------------------------------------.
+| Consume LEN bytes from the current input block, as though by LEN |
+| calls to next_char().  LEN must be less than or equal to the     |
+| previous length returned by a successful call to next_buffer().  |
+`-----------------------------------------------------------------*/
+
+static void
+consume_buffer (size_t len)
+{
+  token_chain *chain;
+  const char *buf;
+  const char *p;
+  size_t buf_len;
+
+  assert (isp && !input_change && len);
+  switch (isp->type)
+    {
+    case INPUT_STRING:
+      assert (len <= isp->u.u_s.len);
+      isp->u.u_s.len -= len;
+      isp->u.u_s.str += len;
+      break;
+
+    case INPUT_FILE:
+      assert (!start_of_input_line);
+      buf = freadptr (isp->u.u_f.fp, &buf_len);
+      assert (buf && len <= buf_len);
+      buf_len = 0;
+      while ((p = memchr (buf + buf_len, '\n', len - buf_len)))
+       {
+         if (p == buf + len - 1)
+           start_of_input_line = true;
+         else
+           current_line = ++isp->line;
+         buf_len = p - buf + 1;
+       }
+      if (freadseek (isp->u.u_f.fp, len) != 0)
+       assert (false);
+      break;
+
+    case INPUT_CHAIN:
+      chain = isp->u.u_c.chain;
+      assert (chain && chain->type == CHAIN_STR && len <= chain->u.u_s.len);
+      /* Partial consumption invalidates quote age.  */
+      chain->quote_age = 0;
+      chain->u.u_s.len -= len;
+      chain->u.u_s.str += len;
+      break;
+
+    default:
+      assert (!"consume_buffer");
+      abort ();
+    }
+}
+
 /*------------------------------------------------------------------.
 | Low level input is done a character at a time.  The function      |
 | peek_input () is used to look at the next character in the input  |
@@ -1046,8 +1209,28 @@ skip_line (const call_info *name)
 {
   int ch;

-  while ((ch = next_char (false, false)) != CHAR_EOF && ch != '\n')
-    ;
+  while (1)
+    {
+      size_t len;
+      const char *buffer = next_buffer (&len, false);
+      if (buffer)
+       {
+         const char *p = (char *) memchr (buffer, '\n', len);
+         if (p)
+           {
+             consume_buffer (p - buffer + 1);
+             ch = '\n';
+             break;
+           }
+         consume_buffer (len);
+       }
+      else
+       {
+         ch = next_char (false, false);
+         if (ch == CHAR_EOF || ch == '\n')
+           break;
+       }
+    }
   if (ch == CHAR_EOF)
     m4_warn (0, name, _("end of file treated as newline"));
 }
@@ -1214,16 +1397,27 @@ match_input (const char *s, size_t slen, bool consume)
   int ch;                      /* input character */
   const char *t;
   bool result = false;
+  size_t len;

   if (consume)
     {
       s++;
       slen--;
     }
+  /* Try a buffer match first.  */
   assert (slen);
+  t = next_buffer (&len, false);
+  if (t && slen <= len && memcmp (s, t, slen) == 0)
+    {
+      if (consume)
+       consume_buffer (slen);
+      return true;
+    }
+
+  /* Fall back on byte matching.  */
   ch = peek_input (false);
   if (ch != to_uchar (*s))
-    return false;                      /* fail */
+    return false;

   if (slen == 1)
     {
@@ -1677,7 +1871,29 @@ next_token (token_data *td, int *line, struct obstack 
*obs, bool allow_argv,
       obstack_grow (obs_td, curr_comm.str1, curr_comm.len1);
       while (1)
        {
-         ch = next_char (false, false);
+         /* Start with buffer search for potential end delimiter.  */
+         size_t len;
+         const char *buffer = next_buffer (&len, false);
+         if (buffer)
+           {
+             const char *p = (char *) memchr (buffer, *curr_comm.str2, len);
+             if (p)
+               {
+                 obstack_grow (obs_td, buffer, p - buffer);
+                 ch = to_uchar (*p);
+                 consume_buffer (p - buffer + 1);
+               }
+             else
+               {
+                 obstack_grow (obs_td, buffer, len);
+                 consume_buffer (len);
+                 continue;
+               }
+           }
+
+         /* Fall back to byte-wise search.  */
+         else
+           ch = next_char (false, false);
          if (ch == CHAR_EOF)
            {
              /* Current_file changed to "" if we see CHAR_EOF, use
@@ -1708,11 +1924,37 @@ next_token (token_data *td, int *line, struct obstack 
*obs, bool allow_argv,
   else if (default_word_regexp && (isalpha (ch) || ch == '_'))
     {
       obstack_1grow (&token_stack, ch);
-      while ((ch = peek_input (false)) < CHAR_EOF
-            && (isalnum (ch) || ch == '_'))
+      while (1)
        {
-         obstack_1grow (&token_stack, ch);
-         next_char (false, false);
+         size_t len;
+         const char *buffer = next_buffer (&len, false);
+         if (buffer)
+           {
+             const char *p = buffer;
+             while (len && (isalnum (to_uchar (*p)) || *p == '_'))
+               {
+                 p++;
+                 len--;
+               }
+             if (p != buffer)
+               {
+                 obstack_grow (&token_stack, buffer, p - buffer);
+                 consume_buffer (p - buffer);
+               }
+             if (len)
+               break;
+           }
+         else
+           {
+             ch = peek_input (false);
+             if (ch < CHAR_EOF && (isalnum (ch) || ch == '_'))
+               {
+                 obstack_1grow (&token_stack, ch);
+                 next_char (false, false);
+               }
+             else
+               break;
+           }
        }
       type = TOKEN_WORD;
     }
@@ -1782,7 +2024,44 @@ next_token (token_data *td, int *line, struct obstack 
*obs, bool allow_argv,
       type = TOKEN_STRING;
       while (1)
        {
-         ch = next_char (obs != NULL && current_quote_age, false);
+         /* Start with buffer search for either potential delimiter.  */
+         size_t len;
+         const char *buffer = next_buffer (&len, obs && current_quote_age);
+         if (buffer)
+           {
+             const char *p = buffer;
+             do
+               {
+                 p = (char *) memchr2 (p, *curr_quote.str1, *curr_quote.str2,
+                                       buffer + len - p);
+               }
+             while (p && current_quote_age
+                    && (*p++ == *curr_quote.str2
+                        ? --quote_level : ++quote_level));
+             if (p)
+               {
+                 if (current_quote_age)
+                   {
+                     assert (!quote_level);
+                     obstack_grow (obs_td, buffer, p - buffer - 1);
+                     consume_buffer (p - buffer);
+                     break;
+                   }
+                 obstack_grow (obs_td, buffer, p - buffer);
+                 ch = to_uchar (*p);
+                 consume_buffer (p - buffer + 1);
+               }
+             else
+               {
+                 obstack_grow (obs_td, buffer, len);
+                 consume_buffer (len);
+                 continue;
+               }
+           }
+
+         /* Fall back to byte-wise search.  */
+         else
+           ch = next_char (obs && current_quote_age, false);
          if (ch == CHAR_EOF)
            {
              /* Current_file changed to "" if we see CHAR_EOF, use
-- 
1.6.1.2

[Prev in Thread]

Current Thread

[Next in Thread]

argv_ref patch 29: huge speedup to m4 input engine, Eric Blake <=
- Re: argv_ref patch 29: huge speedup to m4 input engine, Eric Blake, 2009/02/19
  - Re: argv_ref patch 29: huge speedup to m4 input engine, Gary V. Vaughan, 2009/02/20

Prev by Date: gentoo bug 259184 - spurious testsuite failure
Next by Date: token recognition order
Previous by thread: gentoo bug 259184 - spurious testsuite failure
Next by thread: Re: argv_ref patch 29: huge speedup to m4 input engine
Index(es):
- Date
- Thread