m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: M4 syntax $11 vs. ${11}


From: Gary V. Vaughan
Subject: Re: M4 syntax $11 vs. ${11}
Date: Fri, 2 Mar 2007 10:57:44 -0800

Morning Eric!

On 2 Mar 2007, at 06:07, Eric Blake wrote:
According to Gary V. Vaughan on 3/2/2007 12:28 AM:
Wow, I can't believe how much you've done in the last few months! :) It's
gonna take me a week or two to get back up to speed on everything...

No problem; I don't see any risk of m4 2.0 coming out until a lot of
features have settled, so I can afford to wait for a good review; and even if we do release 1.9b to alpha.gnu.org, it is with explicit documentation
that experimental things may be withdrawn prior to 2.0.

Indeed.  Excellent!

Okay, give me a few days (it's Octavia's birthday in an hour, and mine
the day
after so I won't be online too much until Sunday (-:).

Enjoy your birthday celebrations!

Thankyou :-)

One thing that worries me about where M4 2.0 is headed is that the default build seems to me like it will be too different from 1.4.x and suffer from a lot of the problems and bad feeling that surrounded the transition from
Autoconf 2.13 to 2.5x :-(  We can avoid this if we take care... why
shouldn't people who are stuck in 2.13 land be able to upgrade to M4 2.0
without pain?

I've been trying to ensure that all along. Which is why I had settled on the changeextarg idea (although if you can get changesyntax to understand multi-byte sequences, it would be a use of changesyntax instead of adding changeextarg). The idea is that in autoconf, the first thing it does is disable ${1} so that it behaves the same as in 1.4.x, but enables $ {{1}}

That's well and good for Autoconf 2.62 users, but what about the people
stuck with Autoconf 2.59 (Apple users for example), or 2.13.  I strongly
believe the default should be at least the other way around, so that the
extra work is getting ${1} to work -- but preferably $10 should do what
it has always done in GNU M4 by default for the sanity of upgrading users.

Let's call the ``languages'' I've been referring to recently ``metamodules'' for now, as one day I'd like to see some fuller metaprogramming support in
GNU M4.  If we require even an extra cli option with m4 2.0 to get 1.4.x
behaviour, the Autoconf 2.59 and 2.13 users will throw their arms up in
despair. We need for users to be able to download m4 2.0 tarballs, build
and install them over their 1.4.x installation, and carry on running all
their existing legacy Autoconf code without any additional tweaking. I'm
even planning to find some large Autoconf 2.13 driven projects and check
that m4 2.0 can bootstrap them with its default build -- failing to do
that successfully is a release showstopper IMHO.

as the way to become an early adopter of m4 2.0 extended arguments (very
few, if any, autoconf scripts currently contain "${{").  I'm still
polishing a short patch to autoconf that does just this (my last
submission didn't really have any conceptual problems, just technical in
how autom4te was invoking too many processes:
http://lists.gnu.org/archive/html/autoconf-patches/2007-02/ msg00013.html). It also seems that very few autoconf scripts use $10 to mean the tenth
argument; most people just don't write macros that take that many
arguments.  POSIX requires $10 to mean the first argument concatenated
with 0, and I would rather minimize the places in code that depend on
whether POSIXLY_CORRECT is set.

I agree that simplifying POSIXLY_CORRECT wrinkles is a good idea, so long
as we retain compatibility with 1.4.x.

Maybe there is still a way we can let the user dynamically select whether
$10 means the POSIX interpretation or the tenth argument?  My thoughts
here are that in default 2.0 mode, { and } are the two strings for
extended argument sequences, $10 has the POSIX meaning, and we are
compliant to all POSIX requirements whether or not POSIXLY_CORRECT is set.

ACK. Except that the default 2.0 mode should require extra work by loading additional modules or otherwise tinkering with the invocation command- line, or running an m4 2.0 enabling preamble. In the absence of such intervention,
it is important for the acceptance of m4 2.0, that it start up in 1.4.x
compatibility mode.

 And by setting the strings to empty, then $10 looks like an extended
argument, restoring 1.4.x behavior without violating POSIX (because the
only way to change the default syntax table is to use an extension to
POSIX, and once you have used an extension, POSIX rules are no longer
binding). Of course, with empty extended argument delimiters, we have to
be careful of ambiguity (for example, ${1-default} will work when the
delimiters are {}, but $1-default must be parsed as the first argument
concatenated with -default rather than the full string treated as an
extended argument when the delimiters are empty).

Good point.

Maybe some of them will find all the cool new functionality we already have in 2.0 and start using it then before giving up in disgust and downgrading
to 1.4.x when a bunch of their configure.ins stop building...

That's been my worry all along. I have (hopefully) been very careful to
ensure that running autoconf with 1.9b sees very little impact.

Agreed.  I think the only problem we're about to encounter is that you
are making the POSIX meta-module the default, where I think 1.4.x needs
to be the default.

[[snip changeextarg stuff]]

I also have some parser extension patches that implemented named
arguments with define:

  define(`foo(bar, baz)', `
    ${bar}, ${baz}!
  ')
  foo(`hello', `world')
  =>hello, world!

Looks similar to m5.  And almost works great with extended arguments -
POSIX specifically reserved all uses of ${ for the implementation, and
leaves the interpretation of '(' in a macro name up in the air (since most traditional implementations reject it; without indir, there would be no way to invoke such a macro). And I have specifically not done anything
with non-numeric extended arguments yet, because I had the goal of
supporting named arguments like that. The only problem is that autoconf currently defines macros named "foo(bar)" with the parenthesis as part of
the name for purposes of data storage (knowing that the macro won't be
directly invoked), so we would need a way to select whether () is part of the macro name or delimits the named arguments to a shorter macro name.

Yep.  1.4.x metamodule makes the parser treat input just the same as
our 1.4.x releases; m42 metamodule and/or posix metamodule will work as
described above; m5 metamodule will follow the m5 implementation docs.

Maybe the syntax should be:
define(`foo{bar}', `${bar}')
foo(`hello')
=>hello

with the same delimiters for naming arguments as what is used in accessing those extended arguments? A quick grep of autoconf doesn't turn up any
macros with {} in their name.

Sure. We can choose whatever we like for m42, because the default m4 (1.4.x) metamodule will be loaded by default, and even Autoconf 2.59 will behave as
it always has.

And argument defaulting:

  define(`foo(bar, baz=`cruel world')', `
    ${bar}, ${baz}!
  ')
  foo(`Goodbye')
  =>Goodbye, cruel world!

My patches so far sort of crippled the = assign syntax class (there
weren't enough classes with only 16 bits; maybe we shift the syntax table
to use 32 bits for classifications?).  But that is still possible with
this syntax:

define(`foo{bar, baz}', `${bar}, ${baz-`cruel world'}!')
foo(`Goodbye')
=>Goodbye, cruel world!

Where ${baz-`cruel world'} assigns to baz?  We should probably use
${baz=`cruel world'} for that, and leave the former for defaulting
unset values only.

At the time, I think Akim wanted a more perl-like syntax, where I had
followed a more lisp-like path.  Since then M5 was pointed out to me,
so I guess this work can support M5 syntax and be the first steps to
implementing the M5 `language'.

I just barely found and read the m5 language documentation
(http://techreports.lib.berkeley.edu/accessPages/CSD-91-621.html). It had
lots of ideas for extended arguments (IIRC, stuff like ${,2,*} meaning
output a leading comma if anything else is output, then output from $2 to $n a comma-separated list of only non-empty arguments, except that it used $() instead of ${}). It also had the idea of $() vs. $'' for whether the argument expansion was quoted, so that instead of providing $* vs. $@, it
provided $(*) vs. $'*'.  Hmm, maybe that means we need two sets of
extended argument delimiters? By default, only ${} is POSIX compliant,
but a user could also enable the second one to get $'' that quotes the
expansions as in M5.

Other features I liked of m5 - the notion that a single $ starts argument parsing, but $$ expands to $, $$$ expands to $$, and so on, so that it is easier to write macros that define macros. I also liked the idea of pools
(or namespaces), such that it is easy to add or hide a pool of macros.

I haven't read it in a couple of years, but there is a lot of cool stuff
in there. Implementing it will require quite a lot of refactoring of our
core code though, which is why I think it would be cool to take that
opportunity to modularise the core code itself in due course. I certainly don't want to push 2.0 out again to accommodate any of the m5 implementation though, this can be a 2.1/2.2 roadmap item along with more rigorous meta-
modules to enable it.  For 2.0, all we need is cleanup of the current
feature set and a default build that is 1.4.x compatible.

My dream is still to modularise the core enough that m4-2.0 will support different 'languages' (we need a better name for this -- I mean GNU M4 being a 1.4.x compatible, M42 being the new stuff we're putting in that
will break 1.4.x compatibility, M5 etc).

I'm still not sure that I'm breaking much 1.4.x compatibility. And what is breaking is justified by a move closer to POSIX, and will be documented in the manual of how to easily restore the former behavior. I agree that
we could go so far as to build a module 'm4-14x' vs the existing 'gnu'
module, so that from the command-line, 'm4 --gnu' loads all 2.0 features, 'm4 --14x' loads only features in 1.4.x, and 'm4' loads whichever of the
two modules was selected to be the default at ./configure time.

I'd rather see m4 (1.4.x compatible) as the default, posix as what you are
working on with '{', and POSIXLY_CORRECT simplification, and m42 as the
juicy new stuff.

I just
think that the new features are backwards-compatible enough that we won't
have to distinguish between the two, or that anywhere semantics change
between the two, we also provide aids such as --warn-macro-sequence that
the user can enable to quickly find their problematic uses of the
ambiguous syntax, along with good documentation of how to get the desired
behavior, regardless of whether 1.4.x or 2.0 is parsing that code.

I think warn-macro-sequence is an excellent way to provide diagnostics to
users consciously upgrading to the m42 feature set, but shouldn't throw
errors at casual upgraders who's sysadmin or linux distro put m4 2.0 in
their PATH and who want to carry on working with their existing tools
rather than figure out why all the code they wrote yesterday is behaving
differently...

Cheers,
        Gary
--
  ())_.              Email me: address@hidden
  ( '/           Read my blog: http://blog.azazil.net
  / )=         ...and my book: http://sources.redhat.com/autobook
`(_~)_ Join my AGLOCO Network: http://www.agloco.com/r/BBBS7912




Attachment: PGP.sig
Description: This is a digitally signed message part


reply via email to

[Prev in Thread] Current Thread [Next in Thread]