[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Confusing/unclear documentation of Sed back references
From: |
Bob Proulx |
Subject: |
Re: Confusing/unclear documentation of Sed back references |
Date: |
Wed, 26 Nov 2014 12:29:07 -0700 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
Peter Kehl wrote:
> Dear GNU sed maintainers,
I am not one of the sed maintainers. But I found your report
confusing and so am commenting upon it.
> If you use -r, then
> echo HELLO | sed *-r* "s/*\(*HELLO*\)*/She said:\1"
> sed: -e expression #1, char 23: unterminated `s' command
Look at that error. There are multiple problems. First is that there
is a missing the trailing slash. The closing '/' at the end of the
substitute command is missing. That is the error message.
You have many extra '*' characters in that command that should not be
there. As I am sure you know the '*' is an RE modifier that causes
the previous item to match zero or more times. It appears to me that
you have sent html to a mailing list. Never send html to mailing
lists. It looks like this list converts html to plain text and the
stars is the result of a broken conversion. Yet another reason never
to send html to a mailing list.
Broken test cases like that are terribly confusing. Removing the many
extra star characters and fixing the trailing slash and quoting is:
$ echo HELLO | sed "s/\(HELLO\)/She said:\1/"
She said:HELLO
$ echo HELLO | sed -r "s/(HELLO)/She said:\1/"
She said:HELLO
The \(...\) grouping is a BRE (basic regular express) construct. When
using ERE (extended regular expression) the parens are not quoted
because ERE syntax uses them directly unquoted. When changing regular
expression engines from BRE to ERE with -r the ERE syntax should be
used. It is often easy to use 'grep' and 'grep -E' to double check
differences in regular expression engines. It is a different tool
using the same RE engines and can provide another input.
The invocation section documents the -r option.
https://www.gnu.org/software/sed/manual/sed.html#Invoking-sed
-r
--regexp-extended
Use extended regular expressions rather than basic regular
expressions. Extended regexps are those that egrep accepts;
they can be clearer because they usually have less backslashes,
but are a GNU extension and hence scripts that use them are not
portable. See [Extended regular expressions].
The "Extended regular expressions" link points to the extended regular
expression section:
https://www.gnu.org/software/sed/manual/sed.html#Extended-regexps
\(abc*\)\1
becomes ‘(abc*)\1’ when using extended regular
expressions. Backreferences must still be escaped when using
extended regular expressions.
Hope that helps,
Bob
Re: Confusing/unclear documentation of Sed back references, Peter Kehl, 2014/11/26