bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Confusing/unclear documentation of Sed back references


From: Bob Proulx
Subject: Re: Confusing/unclear documentation of Sed back references
Date: Wed, 26 Nov 2014 14:38:24 -0700
User-agent: Mutt/1.5.23 (2014-03-12)

Peter Kehl wrote:
> to all of you and Bruce Korb: Thanks. However, the problem is still
> there even when I have the third slash /:

Unfortunately you forgot to change the syntax to extended regular
expression syntax. :-( 

> in bash:
> echo HELLO | sed -r "s/\(HELLO\)/She said:\1/"
> sed: -e expression #1, char 24: invalid reference \1 on `s' command's RHS

You are using -r but then you are NOT specifying the backreferences
correctly.  In the above you are using \(...\) and that is for BREs
(basic regular expressions) and not EREs (extended regular
expressions).  When you specify -r you are requesting to use EREs.
When you request EREs you must use ERE syntax.

> I don't understand the differences between -r and -E etc. I'm just
> questioning whether
> https://www.gnu.org/software/sed/manual/sed.html#index-Backreferences_002c-in-regular-expressions-103
> (section 3.5) is clear: The replacement can contain <skipped>
> references <skipped> of the match which is contained between the nth
> \( and its matching \).

That document is for the default BRE engine.  Everything on that page
is true concerning the syntax of BREs.  You leave that page when you
specify -r to use EREs.  If you don't use -r then everything on that
page is absolutely true.  If you add -r then you must also add
knowledge of what -r changes.

> Based on the above documentation section, one could assume that the
> above prefixing the capturing parenthesis by backslash \( ... \) still
> applies in -r mode. 

Why?  This is a serious question.  Please say a few words about why
you think \(...\) applies?  This is the source of the confusion.  If
we can get to the bottom of this point then we can fix something
fundamental.

Following through the documentation the first thing one should read
when looking at -r is the -r documentation.

  https://www.gnu.org/software/sed/manual/sed.html#Invoking-sed

  -r
  --regexp-extended
      Use extended regular expressions rather than basic regular
      expressions.  Extended regexps are those that egrep accepts;
      they can be clearer because they usually have less backslashes,
      but are a GNU extension and hence scripts that use them are not
      portable.  See [Extended regular expressions].

Documentation is always in the mind of the reader.  What changes would
you suggest in the above to make it clear to you that using -r uses a
different regular expression engine than when not using -r?

The "Extended regular expressions" link points to the extended regular
expression section:

  https://www.gnu.org/software/sed/manual/sed.html#Extended-regexps

  \(abc*\)\1
      becomes ‘(abc*)\1’ when using extended regular
      expressions.  Backreferences must still be escaped when using
      extended regular expressions.

I think everyone will agree that the documentation on that page is
sparse.  What changes would you suggest in the above?

Compare this to the grep documentation on this same topic.

  
https://www.gnu.org/software/grep/manual/html_node/Basic-vs-Extended.html#Basic-vs-Extended

Because extended regular expressions are an extension of basic regular
expressions they are not usually documented in isolation of basic
regular expressions.  EREs are almost always documented in terms of
the differences from BREs.  The changes are minor.  All of the
documentation of BREs applies *except for the changes* that make EREs
different from BREs.  The changes are small and it wouldn't make sense
to duplicate the entire BRE docs.  And I think having duplicated
documentation on '^' for example would be more confusing.

> Even if the person has used capturing by parenthesis (..) with no
> backslash with other regex tools, she or he could assume that \(
> ... \) still applies - since there's a lot of variation in the world
> of regex tools, so she can expect this to be yet another flavour.

Exactly!  That is exactly why one using -r should be using ERE syntax.
As documented in the manual.

> Please update section 3.5 of the manual to state that capturing by
> \(...\) doesn't work in -r mode, and the user should use common regex
> capturing by (...).

Hmm...  So you are suggesting that every section of the manual that
mentions regular expressions be split into two sections?  One section
would document the default BRE syntax.  Another split section would
document the -r ERE syntax?  I think that would be tedious to maintain
and laborious to read.

> Those extra stars * were added by GNU mailing program, since my
> original email was in HTML - I had made the relevant parts bold, and
> @gnu.org transformed those into stars. Since simple HTML formatting
> would is commonly supported on forums etc. nowadays, I thought that
> @gnu.org would support it, too....

But mailing lists are not web pages.  In a web forum feel free to do
anything the web forum allows.  But please no html on mailing lists.
If it is email then please use plain text.

Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]