m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fix frozen diversions on master branch


From: Eric Blake
Subject: Re: fix frozen diversions on master branch
Date: Mon, 19 May 2008 22:09:52 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080421 Thunderbird/2.0.0.14 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Eric Blake on 5/13/2008 11:39 AM:
| I'm also thinking of changing the frozen file format 2 to require newlines
| between strings.  For example, we now have:
|
| F3,3,2
| dnldnlm4
|
| as the designator for a macro named 'dnl', pointing to the builtin 'dnl' in
| module 'm4'.  But if we instead used:
|
| F3,3,2
| dnl
| dnl
| m4
|

| I also realized that because this patch uses the quotearg module, and
quotearg
| does not pad out a trailing NUL as '\000', then without the newline between
| strings, this would be ambiguous (macro '\0' with content '\1', or macro
'\1'
| with content '\n'?) once argv_ref patch 24 allows NUL in macro names.

Committing this patch, in preparation for merging stage 24; it adds a
newline in the frozen format between all consecutive strings (Q, C, F, T
directives).  NOTE: this invalidates all existing V2 frozen files; but I
am okay with that (we've already done this in the past, and can continue
to do so up until the 2.0 release).  Either re-freeze the original file,
or hand-edit your frozen file to add in appropriate newlines.

| then I could use getndelim2 (rapidly searching for the next \ escape or the
| next newline), instead of fgetc (one byte at a time), resulting in a
speedup by
| processing in blocks instead of bytes (I recently posted a 40% speedup to
| coreutils' cut doing just that).

That patch to follow later, hopefully with some timing numbers to back up
my (as-yet-untested) claim that it should speed up reload.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgyTxAACgkQ84KuGfSFAYAskwCcD+w5aR9R0f70kzraH0r/W3dR
di8AoM16YIakPyox4E6pIpWI1wgIfUuJ
=DJDU
-----END PGP SIGNATURE-----
>From a8e726e4bf47514e770f5885e416f92f150419a9 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Mon, 19 May 2008 21:58:16 -0600
Subject: [PATCH] In frozen file, split consecutive strings with newline.

* src/freeze.c (dump_symbol_CB): Add newline to 'T', 'F'.
(produce_frozen_state): Likewise for 'Q', 'C'.
(reload_frozen_state): Parse the new layout.
[GET_DIRECTIVE]: Fix format 1 regression from 2008-05-13.
* tests/freeze.at (loading format 2): Rewrite to new format.
(reloading unknown builtin): Likewise.
(loading format 1): Make sure backslash-newline is not
interpreted.
* doc/m4.texinfo (Frozen file format 2): Document the format.
* NEWS: Document this change.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog       |   12 +++++++
 NEWS            |   12 ++++---
 doc/m4.texinfo  |   32 ++++++++----------
 src/freeze.c    |   98 +++++++++++++++++++++++++++++++++++++++++--------------
 tests/freeze.at |   44 +++++++++++++++++-------
 5 files changed, 137 insertions(+), 61 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 625a99f..bad66ae 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,17 @@
 2008-05-19  Eric Blake  <address@hidden>
 
+       In frozen file, split consecutive strings with newline.
+       * src/freeze.c (dump_symbol_CB): Add newline to 'T', 'F'.
+       (produce_frozen_state): Likewise for 'Q', 'C'.
+       (reload_frozen_state): Parse the new layout.
+       [GET_DIRECTIVE]: Fix format 1 regression from 2008-05-13.
+       * tests/freeze.at (loading format 2): Rewrite to new format.
+       (reloading unknown builtin): Likewise.
+       (loading format 1): Make sure backslash-newline is not
+       interpreted.
+       * doc/m4.texinfo (Frozen file format 2): Document the format.
+       * NEWS: Document this change.
+
        Fix xgettext options.
        * po/Makevars (XGETTEXT_OPTIONS): The " must be passed to
        xgettext.
diff --git a/NEWS b/NEWS
index 6d2624d..1391e36 100644
--- a/NEWS
+++ b/NEWS
@@ -196,11 +196,13 @@ promoted to 2.0.
 *** The syntax of frozen files format V2 has been improved to save
     additional state.  This includes the `R' directive for default regular
     expression syntax, the `t' directive for traced macros, and the `d'
-    directive for debug mode.  Also, a V2 file can now be represented
-    completely in ASCII, thanks to escape sequences.  Unfortunately, files
-    frozen by M4 1.4q cannot be read by 1.9b, but since 1.4q was not widely
-    distributed, this is not expected to be much of an issue, and comes
-    with the territory of using a beta release.
+    directive for debug mode.  Existing directives with consecutive strings
+    now require an intermediate newline, for faster parsing.  Also, a V2
+    file can now be represented completely in ASCII, thanks to escape
+    sequences.  Unfortunately, files frozen by M4 1.4q cannot be read by
+    1.9b, but since 1.4q was not widely distributed, this is not expected
+    to be much of an issue, and comes with the territory of using a beta
+    release.
   - FIXME: format 2 still needs to catch more missing state; once 2.0 is
     released, any further changes would introduce format 3.
 
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 6978714..2048483 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -8384,12 +8384,9 @@ always behave the same as the original file.
 
 These shortcomings have been addressed in version 2 of the frozen file
 syntax.  New directives have been added, and existing directives have
-additional, and sometimes optional, parameters.
address@hidden FIXME - change implementation to match this:
address@hidden  All @var{str} instances
address@hidden in the grammar are now followed by @key{NL}, which makes the 
split
address@hidden between consecutive strings easier to recognize.
-Strings may now
+additional, and sometimes optional, parameters.  All @var{str} instances
+in the grammar are now followed by @key{NL}, which makes the split
+between consecutive strings easier to recognize.  Strings may now
 contain escape sequences modeled after C, such as @samp{\n} for newline
 or @samp{\0} for @sc{nul}, so that the frozen file can be pure
 @sc{ascii} (although when hand-editing a frozen file, it is still
@@ -8404,8 +8401,7 @@ Confirms the format of the file.  @code{m4} 
@value{VERSION} only creates
 frozen files where @var{number} is 2.  This directive must be the first
 non-comment in the file, and may not appear more than once.
 
address@hidden    C @var{len1} , @var{len2} @key{NL} @var{str1} @key{NL} 
@var{str2} @key{NL}
address@hidden C @var{len1} , @var{len2} @key{NL} @var{str1} @var{str2} @key{NL}
address@hidden C @var{len1} , @var{len2} @key{NL} @var{str1} @key{NL} 
@var{str2} @key{NL}
 Uses @var{str1} and @var{str2} as the begin-comment and
 end-comment strings.  If omitted, then @samp{#} and @key{NL} are the
 comment delimiters.
@@ -8429,9 +8425,9 @@ If omitted, then diversion 0 is current.
 
 @comment FIXME - the first usage, with only one string, is not supported
 @comment in the current code
address@hidden F @var{len1} @key{NL} @var{str1} @key{NL}
address@hidden F @var{len1} , @var{len2} @key{NL} @var{str1} @var{str2} @key{NL}
address@hidden F @var{len1} , @var{len2} , @var{len3} @key{NL} @var{str1} 
@var{str2} @var{str3} @key{NL}
address@hidden @item F @var{len1} @key{NL} @var{str1} @key{NL}
address@hidden F @var{len1} , @var{len2} @key{NL} @var{str1} @key{NL} 
@var{str2} @key{NL}
address@hidden F @var{len1} , @var{len2} , @var{len3} @key{NL} @var{str1} 
@key{NL} @var{str2} @key{NL} @var{str3} @key{NL}
 Defines, through @code{pushdef}, a definition for @var{str1} expanding
 to the function whose builtin name is given by @var{str2} (defaulting to
 @var{str1} if not present).  With two arguments, the builtin name is
@@ -8446,7 +8442,7 @@ builtin entries to the symbol table.  Modules must be 
loaded prior to
 specifying module-specific builtins via the three-argument @code{F} or
 @code{T}.
 
address@hidden Q @var{len1} , @var{len2} @key{NL} @var{str1} @var{str2} @key{NL}
address@hidden Q @var{len1} , @var{len2} @key{NL} @var{str1} @key{NL} 
@var{str2} @key{NL}
 Uses @var{str1} and @var{str2} as the begin-quote and end-quote strings.
 If omitted, then @samp{`} and @samp{'} are the quote delimiters.
 
@@ -8465,13 +8461,13 @@ Enables tracing for any macro named @var{str}, similar 
to using the
 @code{traceon} builtin.  This option may occur more than once for
 multiple macros; if omitted, no macro starts out as traced.
 
address@hidden FIXME - what about a module's text macros, like __gnu__?
address@hidden T @var{len1} , @var{len2} @key{NL} @var{str1} @var{str2} @key{NL}
address@hidden T @var{len1} , @var{len2} @key{NL} @var{str1} @key{NL} 
@var{str2} @key{NL}
address@hidden T @var{len1} , @var{len2} , @var{len3} @key{NL} @var{str1} 
@key{NL} @var{str2} @key{NL} @var{str3} @key{NL}
 Defines, though @code{pushdef}, a definition for @var{str1} expanding to
-the text given by @var{str2}.  With two arguments, the builtin name is
-searched for among the intrinsic builtin functions only; with three
-arguments, the builtin name is searched for amongst the builtin
-functions defined by the module named by @var{str3}.
+the text given by @var{str2}.  This directive may appear more than once
+for the same name, and its order, along with @samp{F}, is important.  If
+present, the optional third argument associates the macro with a module
+named by @var{str3}.
 @end table
 
 @node Compatibility
diff --git a/src/freeze.c b/src/freeze.c
index db5a1cf..d32fc9e 100644
--- a/src/freeze.c
+++ b/src/freeze.c
@@ -216,10 +216,14 @@ dump_symbol_CB (m4_symbol_table *symtab, const char 
*symbol_name,
          fputc ('\n', file);
 
          produce_mem_dump (file, symbol_name, symbol_len);
+         fputc ('\n', file);
          produce_mem_dump (file, text, text_len);
-         if (module)
-           produce_mem_dump (file, module_name, module_len);
          fputc ('\n', file);
+         if (module)
+           {
+             produce_mem_dump (file, module_name, module_len);
+             fputc ('\n', file);
+           }
        }
       else if (m4_is_symbol_value_func (value))
        {
@@ -235,10 +239,14 @@ dump_symbol_CB (m4_symbol_table *symtab, const char 
*symbol_name,
          fputc ('\n', file);
 
          produce_mem_dump (file, symbol_name, symbol_len);
+         fputc ('\n', file);
          produce_mem_dump (file, bp->name, bp_len);
-         if (module)
-           produce_mem_dump (file, module_name, module_len);
          fputc ('\n', file);
+         if (module)
+           {
+             produce_mem_dump (file, module_name, module_len);
+             fputc ('\n', file);
+           }
        }
       else if (m4_is_symbol_value_placeholder (value))
        ; /* Nothing to do for a builtin we couldn't reload earlier.  */
@@ -278,6 +286,7 @@ produce_frozen_state (m4 *context, const char *name)
     {
       xfprintf (file, "Q%zu,%zu\n", pair->len1, pair->len2);
       produce_mem_dump (file, pair->str1, pair->len1);
+      fputc ('\n', file);
       produce_mem_dump (file, pair->str2, pair->len2);
       fputc ('\n', file);
     }
@@ -288,6 +297,7 @@ produce_frozen_state (m4 *context, const char *name)
     {
       xfprintf (file, "C%zu,%zu\n", pair->len1, pair->len2);
       produce_mem_dump (file, pair->str1, pair->len1);
+      fputc ('\n', file);
       produce_mem_dump (file, pair->str2, pair->len2);
       fputc ('\n', file);
     }
@@ -471,12 +481,21 @@ reload_frozen_state (m4 *context, const char *name)
     }                                                          \
   while (0)
 
-#define GET_STRING(File, Buf, BufSize, StrLen)                 \
+#define GET_STRING(File, Buf, BufSize, StrLen, UseChar)                \
   do                                                           \
     {                                                          \
       size_t len = (StrLen);                                   \
       char *p;                                                 \
       int ch;                                                  \
+      if (UseChar)                                             \
+       {                                                       \
+         ungetc (character, File);                             \
+         if (advance_line)                                     \
+           {                                                   \
+             assert (character == '\n');                       \
+             advance_line = false;                             \
+           }                                                   \
+       }                                                       \
       CHECK_ALLOCATION ((Buf), (BufSize), len);                        \
       p = (Buf);                                               \
       while (len-- > 0)                                                \
@@ -490,6 +509,13 @@ reload_frozen_state (m4 *context, const char *name)
          *p++ = ch;                                            \
        }                                                       \
       *p = '\0';                                               \
+      GET_CHARACTER;                                           \
+      while (version > 1 && character == '\\')                 \
+       {                                                       \
+         GET_CHARACTER;                                        \
+         VALIDATE ('\n');                                      \
+         GET_CHARACTER;                                        \
+       }                                                       \
     }                                                          \
   while (0)
 
@@ -526,12 +552,6 @@ reload_frozen_state (m4 *context, const char *name)
            GET_CHARACTER;                                      \
          VALIDATE ('\n');                                      \
        }                                                       \
-      else if (character == '\\')                              \
-       {                                                       \
-         GET_CHARACTER;                                        \
-         VALIDATE ('\n');                                      \
-         continue;                                             \
-       }                                                       \
     }                                                          \
   while (character == '\n')
 
@@ -600,7 +620,7 @@ ill-formed frozen file, version 2 directive `%c' 
encountered"), 'd');
          GET_CHARACTER;
          GET_NUMBER (number[0], false);
          VALIDATE ('\n');
-         GET_STRING (file, string[0], allocated[0], number[0]);
+         GET_STRING (file, string[0], allocated[0], number[0], false);
          VALIDATE ('\n');
 
          m4_set_debug_level_opt (context, m4_debug_decode (context, 0,
@@ -643,9 +663,19 @@ ill-formed frozen file, version 2 directive `%c' 
encountered"), 'F');
 
          /* Get string contents.  */
 
-         GET_STRING (file, string[0], allocated[0], number[0]);
-         GET_STRING (file, string[1], allocated[1], number[1]);
-         GET_STRING (file, string[2], allocated[2], number[2]);
+         GET_STRING (file, string[0], allocated[0], number[0], false);
+         if (version > 1)
+           {
+             VALIDATE ('\n');
+             GET_CHARACTER;
+           }
+         GET_STRING (file, string[1], allocated[1], number[1], true);
+         if (version > 1 && number[2])
+           {
+             VALIDATE ('\n');
+             GET_CHARACTER;
+           }
+         GET_STRING (file, string[2], allocated[2], number[2], true);
          VALIDATE ('\n');
 
          /* Enter a macro having a builtin function as a definition.  */
@@ -686,7 +716,7 @@ ill-formed frozen file, version 2 directive `%c' 
encountered"), 'M');
          GET_CHARACTER;
          GET_NUMBER (number[0], false);
          VALIDATE ('\n');
-         GET_STRING (file, string[0], allocated[0], number[0]);
+         GET_STRING (file, string[0], allocated[0], number[0], false);
          VALIDATE ('\n');
 
          m4__module_open (context, string[0], NULL);
@@ -705,7 +735,7 @@ ill-formed frozen file, version 2 directive `%c' 
encountered"), 'R');
          GET_CHARACTER;
          GET_NUMBER (number[0], false);
          VALIDATE ('\n');
-         GET_STRING (file, string[0], allocated[0], number[0]);
+         GET_STRING (file, string[0], allocated[0], number[0], false);
          VALIDATE ('\n');
 
          m4_set_regexp_syntax_opt (context,
@@ -732,7 +762,7 @@ ill-formed frozen file, version 2 directive `%c' 
encountered"), 'S');
          GET_CHARACTER;
          GET_NUMBER (number[0], false);
          VALIDATE ('\n');
-         GET_STRING (file, string[0], allocated[0], number[0]);
+         GET_STRING (file, string[0], allocated[0], number[0], false);
 
          /* Syntax under M4_SYNTAX_MASKS is handled specially; all
             other characters are additive.  */
@@ -758,7 +788,7 @@ ill-formed frozen file, version 2 directive `%c' 
encountered"), 't');
          GET_CHARACTER;
          GET_NUMBER (number[0], false);
          VALIDATE ('\n');
-         GET_STRING (file, string[0], allocated[0], number[0]);
+         GET_STRING (file, string[0], allocated[0], number[0], false);
          VALIDATE ('\n');
 
          m4_set_symbol_name_traced (M4SYMTAB, string[0], true);
@@ -789,9 +819,17 @@ ill-formed frozen file, version 2 directive `%c' 
encountered"), 't');
 
          /* Get string contents.  */
          if (operation != 'D')
-           GET_STRING (file, string[0], allocated[0], number[0]);
-         GET_STRING (file, string[1], allocated[1], number[1]);
-         GET_CHARACTER;
+           {
+             GET_STRING (file, string[0], allocated[0], number[0], false);
+             if (version > 1)
+               {
+                 VALIDATE ('\n');
+                 GET_CHARACTER;
+               }
+           }
+         else
+           GET_CHARACTER;
+         GET_STRING (file, string[1], allocated[1], number[1], true);
          VALIDATE ('\n');
 
          /* Act according to operation letter.  */
@@ -862,9 +900,19 @@ ill-formed frozen file, version 2 directive `%c' 
encountered"), 'T');
          VALIDATE ('\n');
 
          /* Get string contents.  */
-         GET_STRING (file, string[0], allocated[0], number[0]);
-         GET_STRING (file, string[1], allocated[1], number[1]);
-         GET_STRING (file, string[2], allocated[2], number[2]);
+         GET_STRING (file, string[0], allocated[0], number[0], false);
+         if (version > 1)
+           {
+             VALIDATE ('\n');
+             GET_CHARACTER;
+           }
+         GET_STRING (file, string[1], allocated[1], number[1], true);
+         if (version > 1 && number[2])
+           {
+             VALIDATE ('\n');
+             GET_CHARACTER;
+           }
+         GET_STRING (file, string[2], allocated[2], number[2], true);
          VALIDATE ('\n');
 
          /* Enter a macro having an expansion text as a definition.  */
diff --git a/tests/freeze.at b/tests/freeze.at
index 209c611..2aa2aea 100644
--- a/tests/freeze.at
+++ b/tests/freeze.at
@@ -84,7 +84,8 @@ $ m4 --version | head -n1
 GNU M4 1.4.5
 $ cat frozen.m4
 divert(`-1')
-define(`foo', `\FOO')
+define(`foo', `\n\
+FOO')
 pushdef(`foo', `bar${1}')
 define(`my_define', defn(`define'))
 define(`my_changeword', defn(`changeword'))
@@ -116,8 +117,9 @@ T9,6
 my_definedefine
 F9,6
 my_definedefine
-T3,4
-foo\FOO
+T3,7
+foo\n\
+FOO
 T3,7
 foobar${1}
 F3,3
@@ -139,7 +141,8 @@ my_define([bar], [4])[]popdef([my_define]) bar
 ]])
 
 AT_CHECK_M4([-R frozen.m4f input.m4], [0],
-[[bar${1} /* foo */ \FOO
+[[bar${1} /* foo */ \n\
+FOO
  1
 define 1
  3
@@ -174,15 +177,19 @@ V2
 # missing close quote should be supplied
 Q1,0
 >
+
 # missing close comment should be supplied
 C1,0
 <
+
 M2
 m4
 M3
 gnu
 F7,7,3
-builtinbuiltingnu
+builtin
+builtin
+gnu
 # Text to negative diversion must not crash.  Catches a regression
 # introduced 2007-05-28 and fixed 2007-05-31.
 D-1,5
@@ -196,11 +203,13 @@ D,
 
 # Testing escape sequences
 T4,5
-blah\t\477\040\X5C
+blah
+\t\477\040\X5C
 # Long macro definition.  Catches a regression introduced on 2007-01-20
 # and patched 2007-02-25.
 T4,122
-long01234567890123456789012345678901234567890123456789
+long
+01234567890123456789012345678901234567890123456789
 01234567890123456789012345678901234567890123456789
 01234567890123456789
 ]])
@@ -247,15 +256,23 @@ m4
 M3
 gnu
 T1,5
-a\n\n\n\n\n
+a
+\n\n\n\n\n
 F6,6,2
 divnum\
-divnumm4\
+
+divnum
+\
+m4\
 
 F6,6,2
-divertdivertm4
+divert
+divert
+m4
 F6,6,2
-definedefinem4
+define
+define
+m4
 D]$1[,3
 hi
 
@@ -263,7 +280,7 @@ hi
 AT_CHECK_M4([-R frozen.m4f in.m4], m4_ifval([$2], [1], [0]),
 m4_ifval([$2], [], [m4_bpatsubst([$1], [^0*])
 m4_if(m4_substr([$1], [0], [1]), [-], [], [[hi
-]])]), m4_ifval([$2], [[m4:frozen.m4f:16: integer overflow in frozen file
+]])]), m4_ifval([$2], [[m4:frozen.m4f:24: integer overflow in frozen file
 ]]))
 ])
 
@@ -363,7 +380,8 @@ AT_CHECK_M4([-F frozen.m4f -t undefined empty.m4])
 
 # Add an unknown builtin.
 echo 'F1,1' >> frozen.m4f
-echo 'ab' >> frozen.m4f
+echo 'a' >> frozen.m4f
+echo 'b' >> frozen.m4f
 
 AT_DATA([[input.m4]],
 [[dnl The macro is defined; checking this is safe
-- 
1.5.5.1


reply via email to

[Prev in Thread] Current Thread [Next in Thread]