bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Confusing/unclear documentation of Sed back references


From: Bob Proulx
Subject: Re: Confusing/unclear documentation of Sed back references
Date: Wed, 26 Nov 2014 12:29:07 -0700
User-agent: Mutt/1.5.23 (2014-03-12)

Peter Kehl wrote:
> Dear GNU sed maintainers,

I am not one of the sed maintainers.  But I found your report
confusing and so am commenting upon it.

> If you use -r, then
> echo HELLO | sed *-r* "s/*\(*HELLO*\)*/She said:\1"
> sed: -e expression #1, char 23: unterminated `s' command

Look at that error.  There are multiple problems.  First is that there
is a missing the trailing slash.  The closing '/' at the end of the
substitute command is missing.  That is the error message.

You have many extra '*' characters in that command that should not be
there.  As I am sure you know the '*' is an RE modifier that causes
the previous item to match zero or more times.  It appears to me that
you have sent html to a mailing list.  Never send html to mailing
lists.  It looks like this list converts html to plain text and the
stars is the result of a broken conversion.  Yet another reason never
to send html to a mailing list.

Broken test cases like that are terribly confusing.  Removing the many
extra star characters and fixing the trailing slash and quoting is:

  $ echo HELLO | sed "s/\(HELLO\)/She said:\1/"
  She said:HELLO

  $ echo HELLO | sed -r "s/(HELLO)/She said:\1/"
  She said:HELLO

The \(...\) grouping is a BRE (basic regular express) construct.  When
using ERE (extended regular expression) the parens are not quoted
because ERE syntax uses them directly unquoted.  When changing regular
expression engines from BRE to ERE with -r the ERE syntax should be
used.  It is often easy to use 'grep' and 'grep -E' to double check
differences in regular expression engines.  It is a different tool
using the same RE engines and can provide another input.

The invocation section documents the -r option.

  https://www.gnu.org/software/sed/manual/sed.html#Invoking-sed

  -r
  --regexp-extended
      Use extended regular expressions rather than basic regular
      expressions.  Extended regexps are those that egrep accepts;
      they can be clearer because they usually have less backslashes,
      but are a GNU extension and hence scripts that use them are not
      portable.  See [Extended regular expressions].

The "Extended regular expressions" link points to the extended regular
expression section:

  https://www.gnu.org/software/sed/manual/sed.html#Extended-regexps

  \(abc*\)\1
      becomes ‘(abc*)\1’ when using extended regular
      expressions.  Backreferences must still be escaped when using
      extended regular expressions.

Hope that helps,
Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]