m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 26-gary-changeresyntax.patch


From: Eric Blake
Subject: Re: 26-gary-changeresyntax.patch
Date: Tue, 11 Jul 2006 06:51:04 -0600
User-agent: Thunderbird 1.5.0.4 (Windows/20060516)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Gary V. Vaughan on 7/10/2006 5:45 PM:
> Hi Eric!
> 
> Thanks for the review.  New version attached.
> 
> Okay to commit to HEAD?

Still a couple of nits; but fix those and I think you should commit (we
can tackle fallout later, if needs be).

>>>> +{
>>>> +  int resyntax = -1;
>>>> +
>>>> +  if (spec)
>> Can we guarantee that all callers pass non-NULL spec, since this is
>> static?  Otherwise, passing a NULL spec silently returns -1.
> 
> Nice catch.  No we can't, but as we're making a NULL spec equal to
> RE_SYNTAX_EMACS it's not a problem now :-)

Actually, M4ARG guarantees a non-NULL string - if the user didn't pass
that many arguments, it is equivalent to "", not NULL.  We really don't
pass NULL around, even though some of your code was making that assumption.

>> Your original proposal also suggested listing valid syntax-specs; is that
>> still worth doing here?
> 
> It looked messy, and I don't want to get into calculating the terminal
> width to wrap nicely.  The precedent from our other diagnostics is to
> complain in this style I think...

ACK.  The proposed patch is just fine here.

>> Ouch.  We used to have:
>> regexp(aab,a*)
>> 0
>> regexp(aab,a*,)
>>
>> But now with the resyntax parameter, we have no way to tell whether we
>> wanted the third argument to behave as a literal replacement of the empty
>> string, vs. an index of the match point:
>> regexp(aab,a*,,EMACS)
>> `0' or `'?
> 
> For now, I've fixed this by allowing a 3 argument regexp (where there is
> a valid resyntax as the last argument):
> 
>   regexp(gnu emacs,e+)
>   => 4
>   regexp(gnu emacs, e+, )
>   =>
>   regexp(gnu emacs,e+,emacs)
>   => 4
>   regexp(gnu emacs,e+,emacs,emacs)
>   => emacs

I'm not sure... regexp is a GNU invention, and we are certainly free to
document it this way, but I can't help but think we might break some
existing script.  My thought overnight was that perhaps we invent a new
escape code for displaying the index:

regexp(gnu emacs, e+)dnl 2 args gives index
=> 4
regexp(gnu emacs, e+, )dnl 3 args gives literal third string
=>
regexp(gnu emacs, e+, , emacs)dnl 4 args gives literal third string...
=>
regexp(gnu emacs, e+, \_, emacs)dnl ...except new \_ escape gives index
=> 4

Since we are already doing escape processing on the replacement, adding a
new escape code would preserve backwards compatibility when there are
fewer than four arguments, while allowing the user to choose between
behaviors when using the new argument.

> But I'd be amazed if people are using one of our resyntax specs as the
> 3rd argument of a regexp() macro call.

Don't underestimate the user's ability to surprise :)

> 
> In due course, we can reduce this probability even more, because I want
> to add named argument support to GNU M4, thusly:
> 
>   regexp(gnu emacs, e+, resyntax=emacs)
>   => 4

And what if we want the replacement to be resyntax=emacs - is there an
escape for treating the = as part of the replacement rather than as a
designator for named argument?

>> Unrelated to your patch, but is it time to kill (or at least raise the
>> default) of this arbitrary limit?  autoconf purposefully uses 1024 instead
>> of 250.  We already document that the limit may go away in future releases.
> 
> I don't think we can kill it easily as some limit is required to detect
> infinite loops.  Raising the default seems like a reasonable thing to
> do.  Hardware has come a long way in the last 12 years.

Let's do that in a separate patch, then - at least 1024, because that is
what autom4te uses.

> 
>> And why a linear search?  The map is sorted, so you could do
>> a binary search.
> 
> The table is tiny, so it didn't seem worth the effort to write the
> harder code.  Moot point now though, as the table is no longer sorted.

ACK - O(n) vs. O(log n) isn't noticeable when the maximum n is controlled
by a fixed-length table.

> 
>>>> address@hidden regexprops-generic.texi
>> And where is the patch to bootstrap that ensures that we pull this file in
>> from gnulib?  Or the patch to Makefile.am that ensures we distribute it?
> 
> There is no gnulib-tool option to import texi files, as far as I can
> tell, I had planned on manually updating prior to release just like
> config.guess etc.  I've added regexprops-generic.texi to Makefile.am.

Actually, I'm committing a patch to gnulib to make fdl.texi part of a
module; I can also make a module for regexprops-generic.texi while I'm at it.

> @@ -552,138 +548,137 @@
>     the expansion to this argument.  */
>  
>  /**
> - * regexp(VICTIM, REGEXP, [REPLACEMENT])
> - * eregexp(VICTIM, REGEXP, [REPLACEMENT])
> + * regexp(VICTIM, REGEXP, RESYNTAX)
> + * regexp(VICTIM, REGEXP, [REPLACEMENT], [RESYNTAX])
>   **/
> -
> -static void
> -m4_regexp_do (m4 *context, m4_obstack *obs, int argc,
> -           m4_symbol_value **argv, int syntax)
> +M4BUILTIN_HANDLER (regexp)
>  {
> -  const char *caller;                /* calling macro name */
> -  const char *victim;                /* first argument */
> -  const char *regexp;                /* regular expression */
> -
> +  const char *me;            /* name of this macro */
> +  const char *replace;               /* optional replacement string */
>    m4_pattern_buffer *buf;    /* compiled regular expression */
>    int startpos;                      /* start position of match */
>    int length;                        /* length of first argument */
> +  int resyntax;
> +
> +  me = M4ARG (0);
> +  replace = M4ARG (3);
> +  resyntax = m4_get_regexp_syntax_opt (context);
>  
> -  caller = M4ARG (0);
> -  victim = M4ARG (1);
> -  regexp = M4ARG (2);
> +  if (argc == 4)

If you take my suggestion of adding a \_ escape to imply the index of the
match, then this should no longer look at M4ARG(3) for a resyntax name.

> +    {
> +      resyntax = m4_regexp_syntax_encode (replace);
> +
> +      /* The first case is the most difficult, because the empty string
> +      is a valid RESYNTAX, yet we want `regexp(aab, a*, )' to return
> +      an empty string as per M4 1.4.x!  */
>  
> Index: m4--devo--0/m4/resyntax.c
> ===================================================================
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ m4--devo--0/m4/resyntax.c 2006-07-11 00:33:54.000000000 +0100
> +
> +/* Return the internal code representing the syntax SPEC, or -1 if
> +   SPEC is invalid.  The `m4_syntax_map' table is searched case
> +   insensitively, after replacing any spaces or dashes in SPEC with
> +   underscore characters.  Possible matches for the "GNU_M4" element
> +   then, are "gnu m4", "GNU-m4" or "Gnu_M4".  */
> +int
> +m4_regexp_syntax_encode (const char *spec)
> +{
> +  const m4_resyntax *resyntax;
> +  char *canonical;
> +  char *p;
> +
> +  /* Unless specified otherwise, return the historical GNU M4 default.  */
> +  if (!spec)

I don't think we ever pass a non-NULL spec; should we add an assert here
instead?

> +    return RE_SYNTAX_EMACS;
> +
> +  canonical = strdup (spec);
> address@hidden Changeresyntax
> address@hidden Changing the regular expression syntax
> +
> address@hidden regular expression syntax, changing
> address@hidden GNU extensions
> address@hidden {Builtin (gnu)} changeresyntax (@w{opt @var{resyntax}})
> +By default, the @sc{gnu} extensions @code{patsubst}, @code{regexp} and
> +more recently @code{renamesyms} continue to use emacs style regular
> +expression syntax (@pxref{Regular expression syntax}).
> +
> +The @code{changeresyntax} macro expands to nothing, but changes the
> +default regular expression syntax used by M4 according to the value of
> address@hidden, equivalent to passing @var{resyntax} as the argument to
> address@hidden when invoking @code{m4}.  @xref{Invoking m4},
> +for more details.  If @var{resyntax} is empty, or omitted the default

s/empty, or omitted /empty or omitted, /

> +settings are reverted to emacs style.
> address@hidden deffn
> +
> address@hidden Eregexp and Regexp
> address@hidden Regexp
>  @section Searching for regular expressions
>  
>  @cindex regular expressions
>  @cindex GNU extensions
> address@hidden {Builtin (gnu)} eregexp (@var{string}, @var{regexp}, @w{opt 
> @var{replacement})}
> address@hidden {Builtin (gnu)} regexp (@var{string}, @var{regexp}, @w{opt 
> @var{replacement},} @w{opt @var{resyntax})}
>  Searching for regular expressions is done with the builtin
>  @code{regexp}, which searches for @var{regexp} in @var{string}.  The
>  syntax of regular expressions is similar to that of Perl, @sc{gnu} Awk
> @@ -3014,13 +3114,18 @@
>  is specified and matches, then it expands into @var{replacement}. If
>  @var{regexp} does not match anywhere in @var{string}, it expands to -1.
>  
> -The builtin macro @code{eregexp} is recognized only when given arguments.
> +If @var{resyntax} is given, the particular flavor of regular
> +expression understood with respect to @var{regexp} can be changed from
> +the current default.  @xref{Changeresyntax}, for details of the values
> +that can be given for this argument.
> +
> +The builtin macro @code{regexp} is recognized only when given arguments.
>  @end deffn
>  
>  @example
> -eregexp(`GNUs not Unix', `\<[a-z]\w+')
> +regexp(`GNUs not Unix', `\<[a-z]\w+')
>  @result{}5
> -eregexp(`GNUs not Unix', `\<Q\w*')
> +regexp(`GNUs not Unix', `\<Q\w*')
>  @result{}-1
>  @end example
>  
> @@ -3030,27 +3135,21 @@
>  @samp{\&} being the text the entire regular expression matched.
>  
>  @example
> -eregexp(`GNUs not Unix', `\w(\w+)$', `*** \& *** \1 ***')
> +regexp(`GNUs not Unix', `\w\(\w+\)$', `*** \& *** \1 ***')
>  @result{}*** Unix *** nix ***
>  @end example
>  
> -Originally, regular expressions were much less powerful (basically only
> address@hidden was available), but to keep backward compatibility, new
> -operators were implemented with previously invalid sequences, such as
> address@hidden(}.  The following macro is exactly equivalent to 
> @code{eregexp},
> -but using the old, clumsy syntax.
> -
> address@hidden {Builtin (gnu)} regexp (@var{string}, @var{regexp}, @w{opt 
> @var{replacement})}
> -Same as @code{eregexp}, but using the old and clumsy ``Basic Regular
> -Expression'' syntax, the same as in @sc{gnu} Emacs.  @xref{Regexps, ,
> -Syntax of Regular Expressions, emacs, The @sc{gnu} Emacs Manual}.
> address@hidden deffn
> +If @var{resyntax} is given, @var{regexp} must be given according to
> +the syntax chosen, though the default regular expression syntax
> +remains unchanged for other invocations:
>  
>  @example
> -regexp(`GNUs not Unix', `\w\(\w+\)$', `*** \& *** \1 ***')
> +regexp(`GNUs not Unix', `\w(\w+)$', `*** \& *** \1 ***', `POSIX_EXTENDED')
>  @result{}*** Unix *** nix ***
>  @end example

If you stick with your approach, where the third argument is checked to
see if it is a resyntax name rather than a replacement, then you will need
to document that and give an example in this node.

>  
> address@hidden R @var{length} @key{NL} @var{string} @key{NL}
> +Sets the default regexp syntax, where @var{string} encodes one of the
> +regular expression syntaxes supported by @sc{gnu} M4.

Note to myself - I need to be more consistent about using @sc{gnu}, and
distinguishing between the package name of M4 vs. the executable @code{m4}.

Thanks for a good idea, and for responding to my review!

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEs56384KuGfSFAYARAptPAJ9O/o9cuc5kHAe67N0NuU8OrzciBACeJsnj
BxFKWGyXF/vFUJEmH8B81sI=
=Fn0G
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]