bug-m4
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: m4 comment bug


From: Gary V. Vaughan
Subject: Re: m4 comment bug
Date: Mon, 17 Jan 2005 16:21:01 +0000
User-agent: Mozilla Thunderbird 0.9 (X11/20041103)

David Caplan wrote:
> Hi Gary,

Hi David!

> Thanks for the quick response!

No probs.

> According to the documentation "All characters between the comment
> delimeters are ignored", but you are saying that m4 tokenizes on quotes
> as the very first pass.

Yeah, kindof.  A quoted comment (ala your define argument) is slightly
different to an unquoted comment.  When M4 see's an open quote, then it
reads to the matching close quote, strips the outer quotes and then
goes on to process the quoted content.

> This seems contrary to the documentation as
> well as how every other programming language is parsed.  I think the
> reasonable expectation when one inserts a comment is that the comment
> text will not be parsed or processed in any way.

But similarly, M4 is supposed to read text between quotes without looking
at the content and behaving differently depending on what it sees.

> Is the difference because m4 is not really a traditional programming
> language, but a pre-processing language?  I still think it is
> unreasonable (i.e., a bug) to allow processing to be done within a
> comment by default.

I think I see where we differ here, and it is unfortunate that quoting
in M4 is so difficult to get right.  Rest assured that when you have
mastered the subtleties of quoting and rescanning, then M4s behaviour
becomes much more predictable.

I think you meant to write this:

  define(`foo',
  # there aren't any arguments to foo
  ``this is output of the foo macro'')

Notice that the comment is not quoted now, so references to macros (foo)
and unbalanced quotes (') are left untouched as the reader tokenises the
text between # and \n as a single comment token.

> I think the reasonable solution is to use the
> changequote, or changecom, when one _desires_ parsing of something
> normally thought of as a comment (see example in documentation for
> changecom).

The opposite is true of M4, so unfortunately, I think you will be surprised
by the expansion of foo given the definition above:

  foo
    => # there aren't any arguments to foo
       this is output of the foo macro

But this is correct according to POSIX SUSv3
(http://www.opengroup.org/onlinepubs/009695399/utilities/m4.html):

   Comments are written but not scanned for matching macro names; by
   default, the begin-comment string consists of the number sign character
   and the end-comment string consists of a <newline>.

If you want to write text in the arguments to macros, but have it removed,
then you must remove it yourself since comments in m4 are also a little
different to what you might expect to see in an imperative language.
Fortunately, because arguments are rescanned for expansions, it is easy
to do this (it is referenced in the GNU M4 docs IIRC):

   define(`foo', ifelse(
   # there aren't any arguments to foo
   )``this is output of the foo macro'')
     =>
   foo
     => this is output of the foo macro

So, I've used ifelse to discard the text during scanning, but carefully
retained the # comment start character to prevent the unmatched ' or the
reference to foo from being expanded during rescanning (of the argument
to ifelse).  The original "output" string is still double quoted to prevent
expansion of foo.

Note that if I had started the quotes before ifelse, then the ' in aren't
would have been matched as the end of a quoted string, because the # would
have been quoted.  So this is WRONG:

   define(`foo', `ifelse(
   # there aren't any arguments to foo
   )`this is output of the foo macro'')

Note also that double quoting the entire argument would prevent the ifelse
from being expanded (and discarding its argument) during rescanning.  So this
is WRONG too:

   define(`foo', ``ifelse(
   # there aren't any arguments to foo
   )this is output of the foo macro'')

It is good style to single quote all arguments to macros, except where macros
should be expanded (or comments noticed!) during tokenising, in which case the
quotes must be left off; or when an argument must be left untouched, when
double quoting must be used.  So stylisticly, this is better (although a
little harder to understand):

   define(`foo', `ifelse('
   # there aren't any arguments to foo
   `)`this is output of the foo macro'')
     =>
   foo
     =>this is output of the foo macro

If you are still getting to grips with M4, then there are more surprises ahead
when positional parameters ($1 etc.) come in to play, but feel free to ask on
the list if they are not behaving as you expect.

Also you need to be careful about quoting commas correctly otherwise you
might find the reader starts the next argument prematurely.  And remember
that when the text of arguments to macros are rescanned for expansions,
an unexpected comma could be inserted...

> In your example you changed the quote characters to brackets.  I think
> that it becomes overwhelming to have to constantly change the quote
> characters or comment characters because of punctuation one wants to use
> in a comment.

Indeed.  And, especially because the choice is so critical to the behaviour of
the tokeniser and parser, it is important to choose comment and quote
characters that will not interfere with the body of the files that are
being processed.  In practice, the standard `' quotes occur alone in english
text so often that it is unusual NOT to change them.  The normal practice is
to change them once right at the start of the file, choosing the replacements
wisely to avoid having to change them again later in the file just to avoid
the kinds of problems you are encountering.  Autoconf uses [] because
those characters almost never occur unpaired, so they can be double quoted
to pass them through to the output.

   changequote([,])
   define([foo],
   [[open: [, close: ]]])
     =>
   foo
     => open: [, close: ]

> I'm working with SELinux policy, which uses m4 macros as a convention
> for generating parts of the security policy.  I've found that people
> occasionally put quotes in their comments and this hoses up the
> policies. Perhaps this "convention" was an inappropriate use of m4?

Unfortunately so.  But it is a very common misunderstanding.

> At any rate, as the official voice of m4, you are saying that this is
> not a bug and is the appropriate, reasonable, and expected behavior for
> m4, correct?

Absolutely.  I'm only the official voice of GNU M4 though, the POSIX
committee holds the reins of the standard.

> [I don't intend for this to come across as overly argumentative.  I just
> want to make my case to you/whoever is in charge of m4.]

Not at all.

Hopefully, my long explanation will save others from tripping over the
same gotchas.

Cheers,
        Gary.
-- 
Gary V. Vaughan      ())_.  address@hidden,gnu.org}
Research Scientist   ( '/   http://tkd.kicks-ass.net
GNU Hacker           / )=   http://www.gnu.org/software/libtool
Technical Author   `(_~)_   http://sources.redhat.com/autobook

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]