[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
sed POSIX compatibility regarding '|' in regular expressions
From: |
Bruno Haible |
Subject: |
sed POSIX compatibility regarding '|' in regular expressions |
Date: |
Thu, 14 Dec 2006 15:10:28 +0100 |
User-agent: |
KMail/1.9.1 |
Hi,
POSIX states in [1], section "Regular Expressions in sed", that sed uses
basic regular expressions, with three minor modifications. The syntax
of basic regular expressions is defined in [2]. According to the text
and to the "RE and Bracket Expression Grammar" section at the end of this
page, POSIX BREs don't support alternation. "The interpretation of an ordinary
character preceded by a backslash ( '\' ) is undefined" - so this means
that the use of '\|' in BREs is a GNU extension.
Bug #1: The --posix option fails to turn off this GNU extension.
$ sed --version
GNU sed Version 4.1.5
...
$ echo 'aaa//bcd' | sed -e 's,\(a\|X\)*//,,'
bcd # ok, the GNU extension
$ echo 'aaa//bcd' | sed --posix -e 's,\(a\|X\)*//,,'
bcd # wrong, should be aaabcd or signal an error
Bug #2: A doc bug. The section "Extended regular expressions" does not
mention that alternations are a difference between basic and extended
regular expressions: In EREs they are written as '|', in BREs they are
written as '\|' (GNU extension) or unavailable (pure POSIX).
Here's a suggested doc change.
--- sed.texi.bak 2006-01-30 08:27:29.000000000 +0100
+++ sed.texi 2006-12-14 00:06:19.000000000 +0100
@@ -2913,7 +2913,7 @@
@cindex Extended regular expressions, syntax
The only difference between basic and extended regular expressions is in
-the behavior of a few characters: @samp{?}, @samp{+}, parentheses,
+the behavior of a few characters: @samp{?}, @samp{+}, @samp{|}, parentheses,
and braces (@address@hidden@}}). While basic regular expressions require
these to be escaped if you want them to behave as special characters,
when using extended regular expressions you must escape them if
@@ -2926,9 +2926,22 @@
becomes @samp{abc\?} when using extended regular expressions. It matches
the literal string @samp{abc?}.
address@hidden abc\?
+becomes @samp{abc?} when using extended regular expressions. It matches
+either @samp{ab} or @samp{abc}. This construct is a GNU extension for
+basic regular expressions, but standard POSIX for extended regular
+expressions.
+
@item c\+
becomes @samp{c+} when using extended regular expressions. It matches
-one or more @samp{c}s.
+one or more @samp{c}s. This construct is a GNU extension for basic regular
+expressions, but standard POSIX for extended regular expressions.
+
address@hidden abc\|def
+becomes @samp{abc|def} when using extended regular expressions. It matches
+either @samp{abc} or @samp{def}. This construct, called ``alternation'',
+is a GNU extension for basic regular expressions, but standard POSIX for
+extended regular expressions.
@item address@hidden,address@hidden
becomes @address@hidden,@}} when using extended regular expressions. It
matches
[1] http://www.opengroup.org/susv3/utilities/sed.html
[2] http://www.opengroup.org/susv3/basedefs/xbd_chap09.html
- sed POSIX compatibility regarding '|' in regular expressions,
Bruno Haible <=