m4-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: improve substr


From: Eric Blake
Subject: Re: improve substr
Date: Thu, 08 Jan 2009 06:19:07 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.19) Gecko/20081209 Thunderbird/2.0.0.19 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Eric Blake on 12/26/2008 12:52 AM:
> According to Eric Blake on 12/24/2008 4:23 PM:
>> Again, implementing this natively will be more efficient.  What do you think 
>> of 
>> adding these two enhancements to substr?
> 
> 
> Here's an implementation of the two patches; I'm now in the process of
> regression testing autoconf and bison to ensure they don't trip up on the
> new semantics

No problems detected, so here's what I'm pushing for branch-1.6 and master:

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkll/UsACgkQ84KuGfSFAYA5TQCcCLbqEIhDxh3l3mU1dZ0Vl0YP
LHMAoIfOMKflP7eLmBMXP4IGOIxqjh9n
=wTip
-----END PGP SIGNATURE-----
>From e9e4abba45f7e9f368cf497e14bc2ce64b867a02 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Fri, 26 Dec 2008 00:33:18 -0700
Subject: [PATCH] Enhance substr to support negative values.

* doc/m4.texinfo (Substr): Document new semantics, and how to
simulate old.
* src/builtin.c (m4_substr): Support negative values.
* NEWS: Document this.
---
 ChangeLog      |    8 +++
 NEWS           |    9 +++-
 doc/m4.texinfo |  157 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
 src/builtin.c  |   49 +++++++++++------
 4 files changed, 194 insertions(+), 29 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 41719e2..7de9851 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2009-01-06  Eric Blake  <address@hidden>
+
+       Enhance substr to support negative values.
+       * doc/m4.texinfo (Substr): Document new semantics, and how to
+       simulate old.
+       * src/builtin.c (m4_substr): Support negative values.
+       * NEWS: Document this.
+
 2009-01-05  Eric Blake  <address@hidden>
 
        Use nicer email address in web manual.
diff --git a/NEWS b/NEWS
index 2e1a286..0c2094e 100644
--- a/NEWS
+++ b/NEWS
@@ -1,6 +1,6 @@
 GNU M4 NEWS - User visible changes.
-Copyright (C) 1992, 1993, 1994, 2004, 2005, 2006, 2007, 2008 Free Software
-Foundation, Inc.
+Copyright (C) 1992, 1993, 1994, 2004, 2005, 2006, 2007, 2008, 2009 Free
+Software Foundation, Inc.
 
 * Noteworthy changes in Version 1.6 (????-??-??) [stable]
   Released by ????, based on git versions 1.4.10b.x-* and 1.5.*
@@ -53,6 +53,11 @@ Foundation, Inc.
    the current expansion is nested within argument collection of another
    macro.  It has also been optimized for faster performance.
 
+** The `substr' builtin now treats negative arguments as indices relative
+   to the end of the string.  The manual gives an
+   example of how to recover M4 1.4.x behavior, as well as an example of
+   simulating the new negative argument semantics with older M4.
+
 ** The `-d'/`--debug' command-line option now understands `-' and `+'
    modifiers, the way the builtin `debugmode' has always done; this allows
    `-d-V' to disable prior debug settings from the command line, similar to
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 93adb64..17cebbf 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -44,7 +44,7 @@
 language.
 
 Copyright @copyright{} 1989, 1990, 1991, 1992, 1993, 1994, 2004, 2005,
-2006, 2007, 2008 Free Software Foundation, Inc.
+2006, 2007, 2008, 2009 Free Software Foundation, Inc.
 
 @quotation
 Permission is granted to copy, distribute and/or modify this document
@@ -6233,12 +6233,27 @@ Substr
 Substrings are extracted with @code{substr}:
 
 @deffn Builtin substr (@var{string}, @var{from}, @ovar{length})
-Expands to the substring of @var{string}, which starts at index
address@hidden, and extends for @var{length} characters, or to the end of
address@hidden, if @var{length} is omitted.  The starting index of a string
-is always 0.  The expansion is empty if there is an error parsing
address@hidden or @var{length}, if @var{from} is beyond the end of
address@hidden, or if @var{length} is negative.
+Performs a substring operation on @var{string}.  If @var{from} is
+positive, it represents the 0-based index where the substring begins.
+If @var{length} is omitted, the substring ends at the end of
address@hidden; if it is positive, @var{length} is added to the starting
+index to determine the ending index.
+
address@hidden @acronym{GNU} extensions
+As a @acronym{GNU} extension, if @var{from} is negative, it is added to
+the length of @var{string} to determine the starting index; if it is
+empty, the start of the string is used.  Likewise, if @var{length} is
+negative, it is added to the length of @var{string} to determine the
+ending index, and an emtpy @var{length} behaves like an omitted
address@hidden  It is not an error if either of the resulting indices lie
+outside the string, but the selected substring only contains the bytes
+of @var{string} that overlap the selected indices.  If the end point
+lies before the beginning point, the substring chosen is the empty
+string located at the starting index.
+
+The expansion is the selected substring, which may be empty.  The
+expansion is empty and a warning issued if @var{from} or @var{length}
+cannot be parsed.
 
 The macro @code{substr} is recognized only with parameters.
 @end deffn
@@ -6250,15 +6265,137 @@ Substr
 @result{}gnats
 @end example
 
-Omitting @var{from} evokes a warning, but still produces output.
+Omitting @var{from} evokes a warning, but still produces output.  On the
+other hand, selecting a @var{from} or @var{length} that lies beyond
address@hidden is not a problem.
 
 @example
 substr(`abc')
 @error{}m4:stdin:1: Warning: substr: too few arguments: 1 < 2
 @result{}abc
-substr(`abc',)
address@hidden:stdin:2: Warning: substr: empty string treated as 0
+substr(`abc', `')
 @result{}abc
+substr(`abc', `4')
address@hidden
+substr(`abc', `1', `4')
address@hidden
address@hidden example
+
+Using negative values for @var{from} or @var{length} are @acronym{GNU}
+extensions, useful for accessing a fixed size tail of an
+arbitrary-length string.  Prior to M4 1.6, using these values would
+silently result in the empty string.  Some other implementations crash
+on negative values, and many treat an explicitly empty @var{length} as
+0, which is different from the omitted @var{length} implying the rest of
+the original @var{string}.
+
address@hidden
+substr(`abcde', `2', `')
address@hidden
+substr(`abcde', `-3')
address@hidden
+substr(`abcde', `', `-3')
address@hidden
+substr(`abcde', `-6')
address@hidden
+substr(`abcde', `-6', `5')
address@hidden
+substr(`abcde', `-7', `1')
address@hidden
+substr(`abcde', `1', `-2')
address@hidden
+substr(`abcde', `-4', `-1')
address@hidden
+substr(`abcde', `4', `-3')
address@hidden
+substr(`abcdefghij', `-09', `08')
address@hidden
address@hidden example
+
+If backwards compabitility to M4 1.4.x behavior is necessary, the
+following macro is sufficient to do the job (mimicking warnings about
+empty @var{from} or @var{length} or an ignored fourth argument is left
+as an exercise to the reader).
+
address@hidden
+define(`substr', `ifelse(`$#', `0', ``$0'',
+  eval(`2 < $#')`$3', `1', `',
+  index(`$2$3', `-'), `-1', `builtin(`$0', `$1', `$2', `$3')')')
address@hidden
+substr(`abcde', `3')
address@hidden
+substr(`abcde', `3', `')
address@hidden
+substr(`abcde', `-1')
address@hidden
+substr(`abcde', `1', `-1')
address@hidden
+substr(`abcde', `2', `1', `C')
address@hidden
address@hidden example
+
+On the other hand, it is possible to portably emulate the @acronym{GNU}
+extension of negative @var{from} and @var{length} arguments across all
address@hidden implementations, albeit with a lot more overhead.  This
+example uses @code{incr} and @code{decr} to normalize @samp{-08} to
+something that a later @code{eval} will treat as a decimal value, rather
+than looking like an invalid octal number, while avoiding using these
+macros on an empty string.  The helper macro @code{_substr_normalize} is
+recursive, since it is easier to fix @var{length} after @var{from} has
+been normalized, with the final iteration supplying two non-negative
+arguments to the original builtin, now named @code{_substr}.
+
address@hidden options: -daq -t_substr
address@hidden
+$ @kbd{m4 -daq -t _substr}
+define(`_substr', defn(`substr'))dnl
+define(`substr', `ifelse(`$#', `0', ``$0'',
+  `_$0(`$1', _$0_normalize(len(`$1'),
+    ifelse(`$2', `', `0', `incr(decr(`$2'))'),
+    ifelse(`$3', `', `', `incr(decr(`$3'))')))')')dnl
+define(`_substr_normalize', `ifelse(
+  eval(`$2 < 0 && $1 + $2 >= 0'), `1',
+    `$0(`$1', eval(`$1 + $2'), `$3')',
+  eval(`$2 < 0')`$3', `1', ``0', `$1'',
+  eval(`$2 < 0 && $3 - 0 >= 0 && $1 + $2 + $3 - 0 >= 0'), `1',
+    `$0(`$1', `0', eval(`$1 + $2 + $3 - 0'))',
+  eval(`$2 < 0 && $3 - 0 >= 0'), `1', ``0', `0'',
+  eval(`$2 < 0'), `1', `$0(`$1', `0', `$3')',
+  `$3', `', ``$2', `$1'',
+  eval(`$3 - 0 < 0 && $1 - $2 + $3 - 0 >= 0'), `1',
+    ``$2', eval(`$1 - $2 + $3')',
+  eval(`$3 - 0 < 0'), `1', ``$2', `0'',
+  ``$2', `$3'')')dnl
+substr(`abcde', `2', `')
address@hidden: -1- _substr(`abcde', `2', `5')
address@hidden
+substr(`abcde', `-3')
address@hidden: -1- _substr(`abcde', `2', `5')
address@hidden
+substr(`abcde', `', `-3')
address@hidden: -1- _substr(`abcde', `0', `2')
address@hidden
+substr(`abcde', `-6')
address@hidden: -1- _substr(`abcde', `0', `5')
address@hidden
+substr(`abcde', `-6', `5')
address@hidden: -1- _substr(`abcde', `0', `4')
address@hidden
+substr(`abcde', `-7', `1')
address@hidden: -1- _substr(`abcde', `0', `0')
address@hidden
+substr(`abcde', `1', `-2')
address@hidden: -1- _substr(`abcde', `1', `2')
address@hidden
+substr(`abcde', `-4', `-1')
address@hidden: -1- _substr(`abcde', `1', `3')
address@hidden
+substr(`abcde', `4', `-3')
address@hidden: -1- _substr(`abcde', `4', `0')
address@hidden
+substr(`abcdefghij', `-09', `08')
address@hidden: -1- _substr(`abcdefghij', `1', `8')
address@hidden
 @end example
 
 @node Translit
diff --git a/src/builtin.c b/src/builtin.c
index 33ef9e5..8d7ed6b 100644
--- a/src/builtin.c
+++ b/src/builtin.c
@@ -1,7 +1,7 @@
 /* GNU m4 -- A simple macro processor
 
-   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2000, 2004, 2006, 2007,
-   2008 Free Software Foundation, Inc.
+   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2000, 2004, 2006,
+   2007, 2008, 2009 Free Software Foundation, Inc.
 
    This file is part of GNU M4.
 
@@ -1861,20 +1861,22 @@ m4_index (struct obstack *obs, int argc, 
macro_arguments *argv)
   shipout_int (obs, retval);
 }
 
-/*-------------------------------------------------------------------------.
-| The macro "substr" extracts substrings from the first argument, starting |
-| from the index given by the second argument, extending for a length     |
-| given by the third argument.  If the third argument is missing, the     |
-| substring extends to the end of the first argument.                     |
-`-------------------------------------------------------------------------*/
+/*-------------------------------------------------------------------.
+| The macro "substr" extracts substrings from the first argument,    |
+| starting from the index given by the second argument, extending    |
+| for a length given by the third argument.  If the third argument   |
+| is missing or empty, the substring extends to the end of the first |
+| argument.  As an extension, negative arguments are treated as             |
+| indices relative to the string length.                            |
+`-------------------------------------------------------------------*/
 
 static void
 m4_substr (struct obstack *obs, int argc, macro_arguments *argv)
 {
   const call_info *me = arg_info (argv);
   int start = 0;
+  int end;
   int length;
-  int avail;
 
   if (bad_argc (me, argc, 2, 3))
     {
@@ -1884,19 +1886,32 @@ m4_substr (struct obstack *obs, int argc, 
macro_arguments *argv)
       return;
     }
 
-  length = avail = ARG_LEN (1);
-  if (!numeric_arg (me, ARG (2), &start))
+  length = ARG_LEN (1);
+  if (!arg_empty (argv, 2) && !numeric_arg (me, ARG (2), &start))
     return;
+  if (start < 0)
+    start += length;
 
-  if (argc >= 4 && !numeric_arg (me, ARG (3), &length))
-    return;
+  if (arg_empty (argv, 3))
+    end = length;
+  else
+    {
+      if (!numeric_arg (me, ARG (3), &end))
+       return;
+      if (end < 0)
+       end += length;
+      else
+       end += start;
+    }
 
-  if (start < 0 || length <= 0 || start >= avail)
+  if (start < 0)
+    start = 0;
+  if (length < end)
+    end = length;
+  if (end <= start)
     return;
 
-  if (start + length > avail)
-    length = avail - start;
-  obstack_grow (obs, ARG (1) + start, length);
+  obstack_grow (obs, ARG (1) + start, end - start);
 }
 
 /*------------------------------------------------------------------.
-- 
1.6.0.4


>From 59d3cfafa8d73e43a974bc066722cd6220cb479f Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Fri, 26 Dec 2008 00:45:24 -0700
Subject: [PATCH] Enhance substr to support replacement text.

* doc/m4.texinfo (Substr): Document new semantics.
* src/builtin.c (m4_substr): Support optional fourth argument.
* NEWS: Document this.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog      |    5 +++++
 NEWS           |    3 ++-
 doc/m4.texinfo |   34 +++++++++++++++++++++++++++++++---
 src/builtin.c  |   26 ++++++++++++++++++++++++--
 4 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 7de9851..c590435 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,10 @@
 2009-01-06  Eric Blake  <address@hidden>
 
+       Enhance substr to support replacement text.
+       * doc/m4.texinfo (Substr): Document new semantics.
+       * src/builtin.c (m4_substr): Support optional fourth argument.
+       * NEWS: Document this.
+
        Enhance substr to support negative values.
        * doc/m4.texinfo (Substr): Document new semantics, and how to
        simulate old.
diff --git a/NEWS b/NEWS
index 0c2094e..cbea814 100644
--- a/NEWS
+++ b/NEWS
@@ -54,7 +54,8 @@ Software Foundation, Inc.
    macro.  It has also been optimized for faster performance.
 
 ** The `substr' builtin now treats negative arguments as indices relative
-   to the end of the string.  The manual gives an
+   to the end of the string, and accepts an optional fourth argument of
+   text to supply in place of the selected substring.  The manual gives an
    example of how to recover M4 1.4.x behavior, as well as an example of
    simulating the new negative argument semantics with older M4.
 
diff --git a/doc/m4.texinfo b/doc/m4.texinfo
index 17cebbf..37600c4 100644
--- a/doc/m4.texinfo
+++ b/doc/m4.texinfo
@@ -6232,7 +6232,8 @@ Substr
 @cindex substrings, extracting
 Substrings are extracted with @code{substr}:
 
address@hidden Builtin substr (@var{string}, @var{from}, @ovar{length})
address@hidden Builtin substr (@var{string}, @var{from}, @ovar{length}, @
+  @ovar{replace})
 Performs a substring operation on @var{string}.  If @var{from} is
 positive, it represents the 0-based index where the substring begins.
 If @var{length} is omitted, the substring ends at the end of
@@ -6251,9 +6252,13 @@ Substr
 lies before the beginning point, the substring chosen is the empty
 string located at the starting index.
 
-The expansion is the selected substring, which may be empty.  The
+If @var{replace} is omitted, then the expansion is only the selected
+substring, which may be empty.  As a @acronym{GNU} extension,if
address@hidden is provided, then the expansion is the original
address@hidden with the selected substring replaced by @var{replace}.  The
 expansion is empty and a warning issued if @var{from} or @var{length}
-cannot be parsed.
+cannot be parsed, or if @var{replace} is provided but the selected
+indices do not overlap with @var{string}.
 
 The macro @code{substr} is recognized only with parameters.
 @end deffn
@@ -6312,6 +6317,29 @@ Substr
 @result{}bcdefghi
 @end example
 
+Another useful @acronym{GNU} extension, also added in M4 1.6, is the
+ability to replace a substring within the original @var{string}.  An
+empty length substring at the beginning or end of @var{string} is valid,
+but selecting a substring that does not overlap @var{string} causes a
+warning.
+
address@hidden
+substr(`abcde', `1', `3', `t')
address@hidden
+substr(`abcde', `5', `', `f')
address@hidden
+substr(`abcde', `-3', `-4', `f')
address@hidden
+substr(`abcde', `-6', `1', `f')
address@hidden
+substr(`abcde', `-7', `1', `f')
address@hidden:stdin:5: Warning: substr: substring out of range
address@hidden
+substr(`abcde', `6', `', `f')
address@hidden:stdin:6: Warning: substr: substring out of range
address@hidden
address@hidden example
+
 If backwards compabitility to M4 1.4.x behavior is necessary, the
 following macro is sufficient to do the job (mimicking warnings about
 empty @var{from} or @var{length} or an ignored fourth argument is left
diff --git a/src/builtin.c b/src/builtin.c
index 8d7ed6b..6594cb9 100644
--- a/src/builtin.c
+++ b/src/builtin.c
@@ -1867,7 +1867,9 @@ m4_index (struct obstack *obs, int argc, macro_arguments 
*argv)
 | for a length given by the third argument.  If the third argument   |
 | is missing or empty, the substring extends to the end of the first |
 | argument.  As an extension, negative arguments are treated as             |
-| indices relative to the string length.                            |
+| indices relative to the string length.  Also, if a fourth argument |
+| is supplied, the original string is output with the selected      |
+| substring replaced by the argument.                               |
 `-------------------------------------------------------------------*/
 
 static void
@@ -1878,7 +1880,7 @@ m4_substr (struct obstack *obs, int argc, macro_arguments 
*argv)
   int end;
   int length;
 
-  if (bad_argc (me, argc, 2, 3))
+  if (bad_argc (me, argc, 2, 4))
     {
       /* builtin(`substr') is blank, but substr(`abc') is abc.  */
       if (argc == 2)
@@ -1904,6 +1906,26 @@ m4_substr (struct obstack *obs, int argc, 
macro_arguments *argv)
        end += start;
     }
 
+  if (argc >= 5)
+    {
+      /* Replacement text provided.  */
+      if (end < start)
+       end = start;
+      if (end < 0 || length < start)
+       {
+         m4_warn (0, me, _("substring out of range"));
+         return;
+       }
+      if (start < 0)
+       start = 0;
+      if (length < end)
+       end = length;
+      obstack_grow (obs, ARG (1), start);
+      push_arg (obs, argv, 4);
+      obstack_grow (obs, ARG (1) + end, length - end);
+      return;
+    }
+
   if (start < 0)
     start = 0;
   if (length < end)
-- 
1.6.0.4


reply via email to

[Prev in Thread] Current Thread [Next in Thread]