bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Confusing/unclear documentation of Sed back references


From: Peter Kehl
Subject: Re: Confusing/unclear documentation of Sed back references
Date: Thu, 27 Nov 2014 08:02:00 +1100

Hi Sed maintainers again,

to all of you and Bruce Korb: Thanks. However, the problem is still
there even when I have the third slash /:

in bash:
echo HELLO | sed -r "s/\(HELLO\)/She said:\1/"
sed: -e expression #1, char 24: invalid reference \1 on `s' command's RHS

I don't understand the differences between -r and -E etc. I'm just
questioning whether
https://www.gnu.org/software/sed/manual/sed.html#index-Backreferences_002c-in-regular-expressions-103
(section 3.5) is clear: The replacement can contain <skipped>
references <skipped> of the match which is contained between the nth
\( and its matching \).

Based on the above documentation section, one could assume that the
above prefixing the capturing parenthesis by backslash \( ... \) still
applies in -r mode. Even if the person has used capturing by
parenthesis (..) with no backslash with other regex tools, she or he
could assume that \( ... \) still applies - since there's a lot of
variation in the world of regex tools, so she can expect this to be
yet another flavour.

Please update section 3.5 of the manual to state that capturing by
\(...\) doesn't work in -r mode, and the user should use common regex
capturing by (...).

Bob Proulx:

Those extra stars * were added by GNU mailing program, since my
original email was in HTML - I had made the relevant parts bold, and
@gnu.org transformed those into stars. Since simple HTML formatting
would is commonly supported on forums etc. nowadays, I thought that
@gnu.org would support it, too....

Best regards,
-Peter Kehl

On 27 November 2014 at 06:29, Bob Proulx <address@hidden> wrote:
>
> Peter Kehl wrote:
> > Dear GNU sed maintainers,
>
> I am not one of the sed maintainers.  But I found your report
> confusing and so am commenting upon it.
>
> > If you use -r, then
> > echo HELLO | sed *-r* "s/*\(*HELLO*\)*/She said:\1"
> > sed: -e expression #1, char 23: unterminated `s' command
>
> Look at that error.  There are multiple problems.  First is that there
> is a missing the trailing slash.  The closing '/' at the end of the
> substitute command is missing.  That is the error message.
>
> You have many extra '*' characters in that command that should not be
> there.  As I am sure you know the '*' is an RE modifier that causes
> the previous item to match zero or more times.  It appears to me that
> you have sent html to a mailing list.  Never send html to mailing
> lists.  It looks like this list converts html to plain text and the
> stars is the result of a broken conversion.  Yet another reason never
> to send html to a mailing list.
>
> Broken test cases like that are terribly confusing.  Removing the many
> extra star characters and fixing the trailing slash and quoting is:
>
>   $ echo HELLO | sed "s/\(HELLO\)/She said:\1/"
>   She said:HELLO
>
>   $ echo HELLO | sed -r "s/(HELLO)/She said:\1/"
>   She said:HELLO
>
> The \(...\) grouping is a BRE (basic regular express) construct.  When
> using ERE (extended regular expression) the parens are not quoted
> because ERE syntax uses them directly unquoted.  When changing regular
> expression engines from BRE to ERE with -r the ERE syntax should be
> used.  It is often easy to use 'grep' and 'grep -E' to double check
> differences in regular expression engines.  It is a different tool
> using the same RE engines and can provide another input.
>
> The invocation section documents the -r option.
>
>   https://www.gnu.org/software/sed/manual/sed.html#Invoking-sed
>
>   -r
>   --regexp-extended
>       Use extended regular expressions rather than basic regular
>       expressions.  Extended regexps are those that egrep accepts;
>       they can be clearer because they usually have less backslashes,
>       but are a GNU extension and hence scripts that use them are not
>       portable.  See [Extended regular expressions].
>
> The "Extended regular expressions" link points to the extended regular
> expression section:
>
>   https://www.gnu.org/software/sed/manual/sed.html#Extended-regexps
>
>   \(abc*\)\1
>       becomes ‘(abc*)\1’ when using extended regular
>       expressions.  Backreferences must still be escaped when using
>       extended regular expressions.
>
> Hope that helps,
> Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]