bug-m4
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Line synchronisation output in comments


From: Eric Blake
Subject: Re: [PATCH] Line synchronisation output in comments
Date: Fri, 25 May 2007 17:26:58 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

Sergey Poznyakoff <gray <at> Mirddin.farlep.net> writes:

> 
> No, it does not.  After thinking its over I came to the another patch,
> attached below (this time it is made against the CVS version).  The two
> testcases (my initial one, and the one based on your last posting) are
> provided in the attachements 2 and 3.

Thanks for the continued progress.  I don't think your patch quite handles 
three-line comments correctly, though (there were some minor inconsistencies 
about when line numbers were updated, so that two-lines didn't trigger all the 
code paths), so I think my approach below is more reliable.  Here's what I'm 
checking in to the branch; please give it a whirl so I can feel confident 
releasing 1.4.10.

Meanwhile, I have to port it to HEAD.  And I would also like to fix the bug 
that I uncovered in the interactions between -s and divert (but probably only 
on head, as I think it will be pretty invasive for a stable branch).  For an 
example of the bug, note that "#line 3" is not a preprocessor directive any 
more.

$ m4 -s
divert(2)2divert(1)1
dnl
undivert
^D
1
#line 1 "stdin"
2#line 3 "stdin"

It is particularly important to fix the bug on HEAD, since it has the new 
syncoutput builtin to worry about.  I'm thinking that diversions should 
remember the file and line that is in effect when they are created, as well as 
the location of the first \n in diverted text; and use this to generate the 
first #line directive at the appropriate location in case the diversion is 
dumped midline, rather than the current policy of blindly dumping the #line 
directive into the diversion.  Subsequent #line directives are not a problem; 
it is only the first directive per diversion.

2007-05-25  Eric Blake  <address@hidden>

        Fix sync line interaction with multiline comments.
        * doc/m4.texinfo (Other Incompatibilities): Add example, and
        document bug in --syncline/divert interaction.
        (Preprocessor features): Augment test.
        * src/m4.h (output_text): Export.
        (shipout_text, next_token): Add parameter.
        * src/freeze.c (reload_frozen_state): Don't interfere with
        synclines when reloading state.
        * src/output.c (output_text): Export.
        (shipout_text): Take new parameter for start line of token.
        Output at most one syncline per token.
        * src/input.c (next_token): Report line where multiline tokens
        start.
        * src/macro.c (expand_input, expand_token, expand_argument):
        Adjust callers so that line is passed from input to output.
        * NEWS: Document this fix.
        Reported by Sergey Poznyakoff.

Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.100
diff -u -p -r1.1.1.1.2.100 NEWS
--- NEWS        24 May 2007 17:23:42 -0000      1.1.1.1.2.100
+++ NEWS        25 May 2007 17:12:57 -0000
@@ -6,6 +6,8 @@ Version 1.4.10 - ?? ??? 2007, by ????  (
 
 * Fix regression introduced in 1.4.9 in the `eval' builtin when performing
   division.
+* The synclines option `-s' no longer generates sync lines in the middle of
+  multiline comments or quoted strings.
 * Work around a number of corner-case POSIX compliance bugs in various
   broken stdio libraries.  In particular, the `syscmd' builtin behaves
   more predictably when stdin is seekable.
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.124
diff -u -p -r1.1.1.1.2.124 m4.texinfo
--- doc/m4.texinfo      25 May 2007 12:58:50 -0000      1.1.1.1.2.124
+++ doc/m4.texinfo      25 May 2007 17:12:59 -0000
@@ -664,7 +664,8 @@ the file name did not change from the pr
 Synchronization directives are always given on complete lines by
 themselves.  When a synchronization discrepancy occurs in the middle of
 an output line, the associated synchronization directive is delayed
-until the beginning of the next generated line.
+until the next newline that does not occur in the middle of a quoted
+string or comment.
 
 @comment options: -s
 @example
@@ -672,15 +673,31 @@ define(`twoline', `1
 2')
 @result{}#line 2 "stdin"
 @result{}
+changecom(`/*', `*/')
address@hidden
+define(`comment', `/*1
+2*/')
address@hidden 5
address@hidden
 dnl no line
 hello
address@hidden 4
address@hidden 7
 @result{}hello
 twoline
 @result{}1
address@hidden 5
address@hidden 8
 @result{}2
+comment
address@hidden/*1
address@hidden/
+one comment `two
+three'
address@hidden 10
address@hidden /*1
address@hidden/ two
address@hidden
 goodbye
address@hidden 12
 @result{}goodbye
 @end example
 
@@ -6151,7 +6168,29 @@ diverted text as being generated at the 
 The sync line option is used mostly when using @code{m4} as
 a front end to a compiler.  If a diverted line causes a compiler error,
 the error messages should most probably refer to the place where the
-diversion were made, and not where it was inserted again.
+diversion was made, and not where it was inserted again.
+
address@hidden options: -s
address@hidden
+divert(2)2
+divert(1)1
+divert`'0
address@hidden 3 "stdin"
address@hidden
+^D
address@hidden 2 "stdin"
address@hidden
address@hidden 1 "stdin"
address@hidden
address@hidden example
+
+The current @code{m4} implementation has a limitation that the syncline
+output at the start of each diversion occurs no matter what, even if the
+previous diversion did not end with a newline.  This goes contrary to
+the claim that synclines appear on a line by themselves, so this
+limitation may be corrected in a future version of @code{m4}.  In the
+meantime, when using @option{-s}, it is wisest to make sure all
+diversions end with newline.
 
 @item
 @acronym{GNU} @code{m4} makes no attempt at prohibiting self-referential
Index: src/freeze.c
===================================================================
RCS file: /sources/m4/m4/src/freeze.c,v
retrieving revision 1.1.1.1.2.14
diff -u -p -r1.1.1.1.2.14 freeze.c
--- src/freeze.c        1 Nov 2006 22:29:08 -0000       1.1.1.1.2.14
+++ src/freeze.c        25 May 2007 17:12:59 -0000
@@ -1,6 +1,6 @@
 /* GNU m4 -- A simple macro processor
 
-   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2006
+   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2006, 2007
    Free Software Foundation, Inc.
 
    This program is free software; you can redistribute it and/or modify
@@ -329,7 +329,7 @@ reload_frozen_state (const char *name)
 
               make_diversion (number[0]);
               if (number[1] > 0)
-                shipout_text (NULL, string[1], number[1]);
+                output_text (string[1], number[1]);
               break;
 
             case 'F':
Index: src/input.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/input.c,v
retrieving revision 1.1.1.1.2.34
diff -u -p -r1.1.1.1.2.34 input.c
--- src/input.c 5 Feb 2007 13:43:36 -0000       1.1.1.1.2.34
+++ src/input.c 25 May 2007 17:13:00 -0000
@@ -808,22 +808,23 @@ set_word_regexp (const char *regexp)
 #endif /* ENABLE_CHANGEWORD */
 
 
-/*-------------------------------------------------------------------------.
-| Parse and return a single token from the input stream.  A token can     |
-| either be TOKEN_EOF, if the input_stack is empty; it can be TOKEN_STRING |
-| for a quoted string; TOKEN_WORD for something that is a potential macro  |
-| name; and TOKEN_SIMPLE for any single character that is not a part of
           |
-| any of the previous types.                                              |
-|                                                                         |
-| Next_token () return the token type, and passes back a pointer to the
           |
-| token data through TD.  The token text is collected on the obstack      |
-| token_stack, which never contains more than one token text at a time.
           |
-| The storage pointed to by the fields in TD is therefore subject to      |
-| change the next time next_token () is called.                        
           |
-`-------------------------------------------------------------------------*/
+/*--------------------------------------------------------------------.
+| Parse and return a single token from the input stream.  A token     |
+| can either be TOKEN_EOF, if the input_stack is empty; it can be     |
+| TOKEN_STRING for a quoted string; TOKEN_WORD for something that is  |
+| a potential macro name; and TOKEN_SIMPLE for any single character   |
+| that is not a part of any of the previous types.  If LINE is not    |
+| NULL, set *LINE to the line where the token starts.                 |
+|                                                                     |
+| Next_token () return the token type, and passes back a pointer to   |
+| the token data through TD.  The token text is collected on the      |
+| obstack token_stack, which never contains more than one token text  |
+| at a time.  The storage pointed to by the fields in TD is           |
+| therefore subject to change the next time next_token () is called.  |
+`--------------------------------------------------------------------*/
 
 token_type
-next_token (token_data *td)
+next_token (token_data *td, int *line)
 {
   int ch;
   int quote_level;
@@ -833,9 +834,11 @@ next_token (token_data *td)
   char *orig_text = NULL;
 #endif
   const char *file;
-  int line;
+  int dummy;
 
   obstack_free (&token_stack, token_bottom);
+  if (!line)
+    line = &dummy;
 
  /* Can't consume character until after CHAR_MACRO is handled.  */
   ch = peek_input ();
@@ -860,7 +863,7 @@ next_token (token_data *td)
 
   next_char (); /* Consume character we already peeked at.  */
   file = current_file;
-  line = current_line;
+  *line = current_line;
   if (MATCH (ch, bcomm.string, true))
     {
       obstack_grow (&token_stack, bcomm.string, bcomm.length);
@@ -872,7 +875,7 @@ next_token (token_data *td)
       else
        /* current_file changed to "" if we see CHAR_EOF, use the
           previous value we stored earlier.  */
-       M4ERROR_AT_LINE ((EXIT_FAILURE, 0, file, line,
+       M4ERROR_AT_LINE ((EXIT_FAILURE, 0, file, *line,
                          "ERROR: end of file in comment"));
 
       type = TOKEN_STRING;
@@ -955,7 +958,7 @@ next_token (token_data *td)
          if (ch == CHAR_EOF)
            /* current_file changed to "" if we see CHAR_EOF, use
               the previous value we stored earlier.  */
-           M4ERROR_AT_LINE ((EXIT_FAILURE, 0, file, line,
+           M4ERROR_AT_LINE ((EXIT_FAILURE, 0, file, *line,
                              "ERROR: end of file in string"));
 
          if (MATCH (ch, rquote.string, true))
Index: src/m4.h
===================================================================
RCS file: /sources/m4/m4/src/m4.h,v
retrieving revision 1.1.1.1.2.42
diff -u -p -r1.1.1.1.2.42 m4.h
--- src/m4.h    24 May 2007 17:23:43 -0000      1.1.1.1.2.42
+++ src/m4.h    25 May 2007 17:13:00 -0000
@@ -285,7 +285,7 @@ typedef enum token_data_type token_data_
 
 void input_init (void);
 token_type peek_token (void);
-token_type next_token (token_data *);
+token_type next_token (token_data *, int *);
 void skip_line (void);
 
 /* push back input */
@@ -321,7 +321,8 @@ extern int output_current_line;
 
 void output_init (void);
 void output_exit (void);
-void shipout_text (struct obstack *, const char *, int);
+void output_text (const char *, int);
+void shipout_text (struct obstack *, const char *, int, int);
 void make_diversion (int);
 void insert_diversion (int);
 void insert_file (FILE *);
Index: src/macro.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/macro.c,v
retrieving revision 1.1.1.1.2.16
diff -u -p -r1.1.1.1.2.16 macro.c
--- src/macro.c 1 Nov 2006 22:29:08 -0000       1.1.1.1.2.16
+++ src/macro.c 25 May 2007 17:13:00 -0000
@@ -1,7 +1,7 @@
 /* GNU m4 -- A simple macro processor
 
-   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2006 Free Software
-   Foundation, Inc.
+   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2006, 2007 Free
+   Software Foundation, Inc.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
@@ -25,7 +25,7 @@
 #include "m4.h"
 
 static void expand_macro (symbol *);
-static void expand_token (struct obstack *, token_type, token_data *);
+static void expand_token (struct obstack *, token_type, token_data *, int);
 
 /* Current recursion level in expand_macro ().  */
 int expansion_level = 0;
@@ -59,12 +59,13 @@ expand_input (void)
 {
   token_type t;
   token_data td;
+  int line;
 
   obstack_init (&argc_stack);
   obstack_init (&argv_stack);
 
-  while ((t = next_token (&td)) != TOKEN_EOF)
-    expand_token ((struct obstack *) NULL, t, &td);
+  while ((t = next_token (&td, &line)) != TOKEN_EOF)
+    expand_token ((struct obstack *) NULL, t, &td, line);
 
   obstack_free (&argc_stack, NULL);
   obstack_free (&argv_stack, NULL);
@@ -79,7 +80,7 @@ expand_input (void)
 `------------------------------------------------------------------------*/
 
 static void
-expand_token (struct obstack *obs, token_type t, token_data *td)
+expand_token (struct obstack *obs, token_type t, token_data *td, int line)
 {
   symbol *sym;
 
@@ -94,7 +95,8 @@ expand_token (struct obstack *obs, token
     case TOKEN_CLOSE:
     case TOKEN_SIMPLE:
     case TOKEN_STRING:
-      shipout_text (obs, TOKEN_DATA_TEXT (td), strlen (TOKEN_DATA_TEXT (td)));
+      shipout_text (obs, TOKEN_DATA_TEXT (td), strlen (TOKEN_DATA_TEXT (td)),
+                   line);
       break;
 
     case TOKEN_WORD:
@@ -106,10 +108,10 @@ expand_token (struct obstack *obs, token
        {
 #ifdef ENABLE_CHANGEWORD
          shipout_text (obs, TOKEN_DATA_ORIG_TEXT (td),
-                       strlen (TOKEN_DATA_ORIG_TEXT (td)));
+                       strlen (TOKEN_DATA_ORIG_TEXT (td)), line);
 #else
          shipout_text (obs, TOKEN_DATA_TEXT (td),
-                       strlen (TOKEN_DATA_TEXT (td)));
+                       strlen (TOKEN_DATA_TEXT (td)), line);
 #endif
        }
       else
@@ -149,7 +151,7 @@ expand_argument (struct obstack *obs, to
   /* Skip leading white space.  */
   do
     {
-      t = next_token (&td);
+      t = next_token (&td, NULL);
     }
   while (t == TOKEN_SIMPLE && isspace (to_uchar (*TOKEN_DATA_TEXT (&td))));
 
@@ -184,7 +186,7 @@ expand_argument (struct obstack *obs, to
            paren_level++;
          else if (*text == ')')
            paren_level--;
-         expand_token (obs, t, &td);
+         expand_token (obs, t, &td, line);
          break;
 
        case TOKEN_EOF:
@@ -196,7 +198,7 @@ expand_argument (struct obstack *obs, to
 
        case TOKEN_WORD:
        case TOKEN_STRING:
-         expand_token (obs, t, &td);
+         expand_token (obs, t, &td, line);
          break;
 
        case TOKEN_MACDEF:
@@ -213,7 +215,7 @@ expand_argument (struct obstack *obs, to
          abort ();
        }
 
-      t = next_token (&td);
+      t = next_token (&td, NULL);
     }
 }
 
@@ -239,7 +241,7 @@ collect_arguments (symbol *sym, struct o
 
   if (peek_token () == TOKEN_OPEN)
     {
-      next_token (&td);                /* gobble parenthesis */
+      next_token (&td, NULL); /* gobble parenthesis */
       do
        {
          more_args = expand_argument (arguments, &td);
Index: src/output.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/output.c,v
retrieving revision 1.1.1.1.2.19
diff -u -p -r1.1.1.1.2.19 output.c
--- src/output.c        16 Mar 2007 12:30:50 -0000      1.1.1.1.2.19
+++ src/output.c        25 May 2007 17:13:00 -0000
@@ -422,7 +422,7 @@ output_character_helper (int character)
 | to a diversion file or an in-memory diversion buffer.                
          |
 `------------------------------------------------------------------------*/
 
-static void
+void
 output_text (const char *text, int length)
 {
   int count;
@@ -444,23 +444,26 @@ output_text (const char *text, int lengt
     }
 }
 
-/*-------------------------------------------------------------------------.
-| Add some text into an obstack OBS, taken from TEXT, having LENGTH       |
-| characters.  If OBS is NULL, rather output the text to an external file  |
-| or an in-memory diversion buffer.  If OBS is NULL, and there is no      |
-| output file, the text is discarded.                                     |
-|                                                                         |
-| If we are generating sync lines, the output have to be examined, because |
-| we need to know how much output each input line generates.  In general,  |
-| sync lines are output whenever a single input lines generates several
           |
-| output lines, or when several input lines does not generate any output.  |
-`-------------------------------------------------------------------------*/
+/*--------------------------------------------------------------------.
+| Add some text into an obstack OBS, taken from TEXT, having LENGTH   |
+| characters.  If OBS is NULL, output the text to an external file    |
+| or an in-memory diversion buffer instead.  If OBS is NULL, and      |
+| there is no output file, the text is discarded.  LINE is the line   |
+| where the token starts (not necessarily current_line, in the case   |
+| of multiline tokens).                                               |
+|                                                                     |
+| If we are generating sync lines, the output has to be examined,     |
+| because we need to know how much output each input line generates.  |
+| In general, sync lines are output whenever a single input lines     |
+| generates several output lines, or when several input lines do not  |
+| generate any output.                                                |
+`--------------------------------------------------------------------*/
 
 void
-shipout_text (struct obstack *obs, const char *text, int length)
+shipout_text (struct obstack *obs, const char *text, int length, int line)
 {
   static bool start_of_output_line = true;
-  char line[20];
+  char linebuf[20];
   const char *cursor;
 
   /* If output goes to an obstack, merely add TEXT to it.  */
@@ -501,43 +504,59 @@ shipout_text (struct obstack *obs, const
        output_text (text, length);
       }
   else
-    for (; length-- > 0; text++)
-      {
-       if (start_of_output_line)
-         {
-           start_of_output_line = false;
-           output_current_line++;
-
+    {
+      /* Check for syncline only at the start of a token.  Multiline
+        tokens, and tokens that are out of sync but in the middle of
+        the line, must wait until the next raw newline triggers a
+        syncline.  */
+      if (start_of_output_line)
+       {
+         start_of_output_line = false;
+         output_current_line++;
 #ifdef DEBUG_OUTPUT
-           printf ("DEBUG: cur %d, cur out %d\n",
-                   current_line, output_current_line);
+         fprintf (stderr, "DEBUG: line %d, cur %d, cur out %d\n",
+                  line, current_line, output_current_line);
 #endif
 
-           /* Output a `#line NUM' synchronization directive if needed.
-              If output_current_line was previously given a negative
-              value (invalidated), rather output `#line NUM "FILE"'.  */
-
-           if (output_current_line != current_line)
-             {
-               sprintf (line, "#line %d", current_line);
-               for (cursor = line; *cursor; cursor++)
-                 OUTPUT_CHARACTER (*cursor);
-               if (output_current_line < 1 && current_file[0] != '\0')
-                 {
-                   OUTPUT_CHARACTER (' ');
-                   OUTPUT_CHARACTER ('"');
-                   for (cursor = current_file; *cursor; cursor++)
-                     OUTPUT_CHARACTER (*cursor);
-                   OUTPUT_CHARACTER ('"');
-                 }
-               OUTPUT_CHARACTER ('\n');
-               output_current_line = current_line;
-             }
-         }
-       OUTPUT_CHARACTER (*text);
-       if (*text == '\n')
-         start_of_output_line = true;
-      }
+         /* Output a `#line NUM' synchronization directive if needed.
+            If output_current_line was previously given a negative
+            value (invalidated), output `#line NUM "FILE"' instead.  */
+
+         if (output_current_line != line)
+           {
+             sprintf (linebuf, "#line %d", line);
+             for (cursor = linebuf; *cursor; cursor++)
+               OUTPUT_CHARACTER (*cursor);
+             if (output_current_line < 1 && current_file[0] != '\0')
+               {
+                 OUTPUT_CHARACTER (' ');
+                 OUTPUT_CHARACTER ('"');
+                 for (cursor = current_file; *cursor; cursor++)
+                   OUTPUT_CHARACTER (*cursor);
+                 OUTPUT_CHARACTER ('"');
+               }
+             OUTPUT_CHARACTER ('\n');
+             output_current_line = line;
+           }
+       }
+
+      /* Output the token, and track embedded newlines.  */
+      for (; length-- > 0; text++)
+       {
+         if (start_of_output_line)
+           {
+             start_of_output_line = false;
+             output_current_line++;
+#ifdef DEBUG_OUTPUT
+             fprintf (stderr, "DEBUG: line %d, cur %d, cur out %d\n",
+                      line, current_line, output_current_line);
+#endif
+           }
+         OUTPUT_CHARACTER (*text);
+         if (*text == '\n')
+           start_of_output_line = true;
+       }
+    }
 }
 
 /* Functions for use by diversions.  */






reply via email to

[Prev in Thread] Current Thread [Next in Thread]