bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

expr, csplit, nl changes to conform better to POSIX, traditional Unix


From: Paul Eggert
Subject: expr, csplit, nl changes to conform better to POSIX, traditional Unix
Date: Wed, 12 Apr 2006 01:25:06 -0700
User-agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)

I wanted to test the new gnulib regex code, so I stole all the BRE
tests from GNU grep and added them to coreutils, as tests for 'expr'.
This uncovered several glitches, where expr (and csplit and nl) did
not conform to POSIX.

These changes are all minor and (I think) noncontroversial, except for
the business where GNU expr complains if a regular expression starts
with "^".  This diagnostic was added on May 29, 1996, and it has some
nice properties but it also has some annoying ones.  The basic problem
here is that in traditional Unix (a tradition that survives in Solaris
10, for example), "expr a : '^a'" ignored the ^, but I guess some
(now-uncommon?) POSIX implementations treat theed ^ as an ordinary
character not as an anchor.

I guess it depends on whether we want GNU expr to be the universal
donor or the universal acceptor.  Anyway, if this warning is still
desirable (even though expr is obsolescent now :-) we should not emit
the warning if POSIXLY_CORRECT is set.  If you prefer that, please let
me know, and I'll put it the warning in, when in non-POSIXLY_CORRECT
mode.

Here's what I installed:

2006-04-11  Paul Eggert  <address@hidden>

        * NEWS: csplit, nl, expr now conform to POSIX better, and are
        more-compatible with traditional Unix, with respect to regular
        expressions.
        * doc/coreutils.texi (expr invocation): expr exit status is 3 only for
        internal errors now; 2 is also for invalid values in expressions.
        * src/csplit.c (extract_regexp): Set re_syntax_options to a
        value that is compatible with what POSIX requires.
        * src/nl.c (build_type_arg): Likewise.
        * src/expr.c (docolon): Likewise.  Also, don't let anchors match
        newline; this fixes an incompatibility with tradition and with POSIX.
        Don't warn about leading ^.  POSIX says it is unspecified whether
        ^ is a special character, which means that implementations can
        either treat it as special or not, but either way a warning is not
        allowed (unless the regexp is otherwise invalid).  Instead, anchor
        the expression but treat ^ as an anchor; this is the traditional
        behavior (e.g., Solaris 10).
        (eval4, eval3, eval2): Treat non-numeric args, division by zero,
        and the like as invalid expressions (exit status 2), not as
        failure of 'expr' (exit status 3).  This is more consistent with
        how Solaris behaves.
        * tests/expr/basic (fail-a): Adjust exit status to match new expr
        behavior, for status 2 versus 3.
        (anchor): New test.
        (bre1, bre2, bre3, bre4, bre5, bre6, bre7, bre8, bre9, bre10):
        (bre11, bre12, bre13, bre14, bre15, bre16, bre17, bre18, bre19, bre20):
        (bre21, bre22, bre23, bre24, bre25, bre26, bre27, bre28, bre29, bre30):
        (bre31, bre32, bre33, bre34, bre35, bre36, bre37, bre38, bre39, bre40):
        (bre41, bre42, bre43, bre44, bre45, bre46, bre47, bre48, bre49, bre50):
        (bre51, bre52, bre53, bre54, bre55, bre56, bre57, bre58, bre59, bre60):
        (bre61, bre62): New tests.
        * tests/misc/csplit: Use \{...\} in test RE, to test that we're
        conforming to POSIX.

Index: NEWS
===================================================================
RCS file: /fetish/cu/NEWS,v
retrieving revision 1.367
retrieving revision 1.368
diff -p -u -r1.367 -r1.368
--- NEWS        28 Mar 2006 09:46:38 -0000      1.367
+++ NEWS        12 Apr 2006 07:49:34 -0000      1.368
@@ -15,11 +15,23 @@ GNU coreutils NEWS                      
   basename and dirname now treat // as different from / on platforms
   where the two are distinct.
 
+  csplit and nl now use POSIX syntax for regular expressions, not
+  Emacs syntax.  As a result, character classes like [[:print:]] and
+  interval expressions like A\{1,9\} now have their usual meaning,
+  . no longer matches the null character, and \ must precede the + and
+  ? operators.
+
   df now considers "none" and "proc" file systems to be dummies and
   therefore does not normally display them.  Also, inaccessible file
   systems (which can be caused by shadowed mount points or by chrooted
   bind mounts) are now dummies, too.
 
+  expr no longer complains about leading ^ in a regular expression
+  (the anchor is ignored), or about regular expressions like A** (the
+  second "*" is ignored).  expr now exits with status 2 (not 3) for
+  errors it detects in the expression's values; exit status 3 is now
+  used only for internal errors like arithmetic overflow.
+
   ln now uses different (and we hope clearer) diagnostics when it fails.
   ln -v now acts more like FreeBSD, so it generates output only when
   successful and the output is easier to parse.
Index: doc/coreutils.texi
===================================================================
RCS file: /fetish/cu/doc/coreutils.texi,v
retrieving revision 1.321
retrieving revision 1.322
diff -p -u -r1.321 -r1.322
--- doc/coreutils.texi  9 Apr 2006 07:52:33 -0000       1.321
+++ doc/coreutils.texi  12 Apr 2006 07:47:11 -0000      1.322
@@ -10277,8 +10277,8 @@ Exit status:
 @display
 0 if the expression is neither null nor 0,
 1 if the expression is null or 0,
-2 if the expression is syntactically invalid,
-3 if an error occurred.
+2 if the expression is invalid,
+3 if an internal error occurred (e.g., arithmetic overflow).
 @end display
 
 @menu
Index: src/csplit.c
===================================================================
RCS file: /fetish/cu/src/csplit.c,v
retrieving revision 1.147
retrieving revision 1.149
diff -p -u -r1.147 -r1.149
--- src/csplit.c        11 Apr 2006 00:50:33 -0000      1.147
+++ src/csplit.c        12 Apr 2006 07:37:11 -0000      1.149
@@ -1121,6 +1121,8 @@ extract_regexp (int argnum, bool ignore,
   p->re_compiled.allocated = 0;
   p->re_compiled.fastmap = p->fastmap;
   p->re_compiled.translate = NULL;
+  re_syntax_options =
+    RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGES;
   err = re_compile_pattern (str + 1, len, &p->re_compiled);
   if (err)
     {
Index: src/expr.c
===================================================================
RCS file: /fetish/cu/src/expr.c,v
retrieving revision 1.109
retrieving revision 1.111
diff -p -u -r1.109 -r1.111
--- src/expr.c  11 Apr 2006 00:50:56 -0000      1.109
+++ src/expr.c  12 Apr 2006 07:37:11 -0000      1.111
@@ -50,12 +50,13 @@
 /* Exit statuses.  */
 enum
   {
-    /* Invalid expression: i.e., its form does not conform to the
+    /* Invalid expression: e.g., its form does not conform to the
        grammar for expressions.  Our grammar is an extension of the
        POSIX grammar.  */
     EXPR_INVALID = 2,
 
-    /* Some other error occurred.  */
+    /* An internal error occurred, e.g., arithmetic overflow, storage
+       exhaustion.  */
     EXPR_FAILURE
   };
 
@@ -419,22 +420,16 @@ docolon (VALUE *sv, VALUE *pv)
   tostring (sv);
   tostring (pv);
 
-  if (pv->u.s[0] == '^')
-    {
-      error (0, 0, _("\
-warning: unportable BRE: %s: using `^' as the first character\n\
-of the basic regular expression is not portable; it is being ignored"),
-            quote (pv->u.s));
-    }
-
   re_buffer.buffer = NULL;
   re_buffer.allocated = 0;
   re_buffer.fastmap = fastmap;
   re_buffer.translate = NULL;
-  re_syntax_options = RE_SYNTAX_POSIX_BASIC;
+  re_syntax_options =
+    RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGES;
   errmsg = re_compile_pattern (pv->u.s, strlen (pv->u.s), &re_buffer);
   if (errmsg)
-    error (EXPR_FAILURE, 0, "%s", errmsg);
+    error (EXPR_INVALID, 0, "%s", errmsg);
+  re_buffer.newline_anchor = 0;
 
   matchlen = re_match (&re_buffer, sv->u.s, strlen (sv->u.s), 0, &re_regs);
   if (0 <= matchlen)
@@ -634,13 +629,13 @@ eval4 (bool evaluate)
       if (evaluate)
        {
          if (!toarith (l) || !toarith (r))
-           error (EXPR_FAILURE, 0, _("non-numeric argument"));
+           error (EXPR_INVALID, 0, _("non-numeric argument"));
          if (fxn == multiply)
            val = l->u.i * r->u.i;
          else
            {
              if (r->u.i == 0)
-               error (EXPR_FAILURE, 0, _("division by zero"));
+               error (EXPR_INVALID, 0, _("division by zero"));
              val = fxn == divide ? l->u.i / r->u.i : l->u.i % r->u.i;
            }
        }
@@ -676,7 +671,7 @@ eval3 (bool evaluate)
       if (evaluate)
        {
          if (!toarith (l) || !toarith (r))
-           error (EXPR_FAILURE, 0, _("non-numeric argument"));
+           error (EXPR_INVALID, 0, _("non-numeric argument"));
          val = fxn == plus ? l->u.i + r->u.i : l->u.i - r->u.i;
        }
       freev (l);
@@ -738,7 +733,7 @@ eval2 (bool evaluate)
                {
                  error (0, errno, _("string comparison failed"));
                  error (0, 0, _("Set LC_ALL='C' to work around the problem."));
-                 error (EXPR_FAILURE, 0,
+                 error (EXPR_INVALID, 0,
                         _("The strings compared were %s and %s."),
                         quotearg_n_style (0, locale_quoting_style, l->u.s),
                         quotearg_n_style (1, locale_quoting_style, r->u.s));
Index: src/nl.c
===================================================================
RCS file: /fetish/cu/src/nl.c,v
retrieving revision 1.87
retrieving revision 1.89
diff -p -u -r1.87 -r1.89
--- src/nl.c    11 Apr 2006 00:51:23 -0000      1.87
+++ src/nl.c    12 Apr 2006 07:37:11 -0000      1.89
@@ -253,6 +253,8 @@ build_type_arg (char **typep, struct re_
       regexp->allocated = 0;
       regexp->fastmap = fastmap;
       regexp->translate = NULL;
+      re_syntax_options =
+       RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGES;
       errmsg = re_compile_pattern (optarg, strlen (optarg), regexp);
       if (errmsg)
        error (EXIT_FAILURE, 0, "%s", errmsg);
Index: tests/expr/basic
===================================================================
RCS file: /fetish/cu/tests/expr/basic,v
retrieving revision 1.13
retrieving revision 1.14
diff -p -u -r1.13 -r1.14
--- tests/expr/basic    27 May 2005 20:32:28 -0000      1.13
+++ tests/expr/basic    12 Apr 2006 07:17:02 -0000      1.14
@@ -57,12 +57,91 @@ my @Tests =
 
      # This erroneously succeeded and output `3' before 2.0.12.
      ['fail-a', '3 + -', {ERR => "$prog: non-numeric argument\n"},
-      {EXIT => 3}],
+      {EXIT => 2}],
 
      # This erroneously succeeded before 5.3.1.
      ['bigcmp', '-- -2417851639229258349412352 \< 2417851639229258349412352',
       {OUT => '1'}, {EXIT => 0}],
 
+     # In 5.94 and earlier, anchors incorrectly matched newlines.
+     ['anchor', "'a\nb' : 'a\$'", {OUT => '0'}, {EXIT => 1}],
+
+     # These tests are taken from grep/tests/bre.tests.
+     ['bre1', '"abc" : "a\\(b\\)c"', {OUT => 'b'}],
+     ['bre2', '"a(" : "a("', {OUT => '2'}],
+     ['bre3', '_ : "a\\("',
+      {ERR => "$prog: Unmatched ( or \\(\n"}, {EXIT => 2}],
+     ['bre4', '_ : "a\\(b"',
+      {ERR => "$prog: Unmatched ( or \\(\n"}, {EXIT => 2}],
+     ['bre5', '"a(b" : "a(b"', {OUT => '3'}],
+     ['bre6', '"a)" : "a)"', {OUT => '2'}],
+     ['bre7', '_ : "a\\)"',
+      {ERR => "$prog: Unmatched ) or \\)\n"}, {EXIT => 2}],
+     ['bre8', '_ : "\\)"',
+      {ERR => "$prog: Unmatched ) or \\)\n"}, {EXIT => 2}],
+     ['bre9', '"ab" : "a\\(\\)b"', {OUT => ''}, {EXIT => 1}],
+     ['bre10', '"a^b" : "a^b"', {OUT => '3'}],
+     ['bre11', '"a\$b" : "a\$b"', {OUT => '3'}],
+     ['bre12', '"" : "\\($\\)\\(^\\)"', {OUT => ''}, {EXIT => 1}],
+     ['bre13', '"b" : "a*\\(^b\$\\)c*"', {OUT => 'b'}],
+     ['bre14', '"X|" : "X\\(|\\)" : "(" "X|" : "X\\(|\\)" ")"', {OUT => '1'}],
+     ['bre15', '"X*" : "X\\(*\\)" : "(" "X*" : "X\\(*\\)" ")"', {OUT => '1'}],
+     ['bre16', '"abc" : "\\(\\)"', {OUT => ''}, {EXIT => 1}],
+     ['bre17', '"{1}a" : "\\(\\{1\\}a\\)"', {OUT => '{1}a'}],
+     ['bre18', '"X*" : "X\\(*\\)" : "^*"', {OUT => '1'}],
+     ['bre19', '"{1}" : "^\\{1\\}"', {OUT => '3'}],
+     ['bre20', '"{" : "{"', {OUT => '1'}],
+     ['bre21', '"abbcbd" : "a\\(b*\\)c\\1d"', {OUT => ''}, {EXIT => 1}],
+     ['bre22', '"abbcbbbd" : "a\\(b*\\)c\\1d"', {OUT => ''}, {EXIT => 1}],
+     ['bre23', '"abc" : "\\(.\\)\\1"', {OUT => ''}, {EXIT => 1}],
+     ['bre24', '"abbccd" : "a\\(\\([bc]\\)\\2\\)*d"', {OUT => 'cc'}],
+     ['bre25', '"abbcbd" : "a\\(\\([bc]\\)\\2\\)*d"',
+      {OUT => ''}, {EXIT => 1}],
+     ['bre26', '"abbbd" : "a\\(\\(b\\)*\\2\\)*d"', {OUT => 'bbb'}],
+     ['bre27', '"aabcd" : "\\(a\\)\\1bcd"', {OUT => 'a'}],
+     ['bre28', '"aabcd" : "\\(a\\)\\1bc*d"', {OUT => 'a'}],
+     ['bre29', '"aabd" : "\\(a\\)\\1bc*d"', {OUT => 'a'}],
+     ['bre30', '"aabcccd" : "\\(a\\)\\1bc*d"', {OUT => 'a'}],
+     ['bre31', '"aabcccd" : "\\(a\\)\\1bc*[ce]d"', {OUT => 'a'}],
+     ['bre32', '"aabcccd" : "\\(a\\)\\1b\\(c\\)*cd\$"', {OUT => 'a'}],
+     ['bre33', '"a*b" : "a\\(*\\)b"', {OUT => '*'}],
+     ['bre34', '"ab" : "a\\(**\\)b"', {OUT => ''}, {EXIT => 1}],
+     ['bre35', '"ab" : "a\\(***\\)b"', {OUT => ''}, {EXIT => 1}],
+     ['bre36', '"*a" : "*a"', {OUT => '2'}],
+     ['bre37', '"a" : "**a"', {OUT => '1'}],
+     ['bre38', '"a" : "***a"', {OUT => '1'}],
+     ['bre39', '"ab" : "a\\{1\\}b"', {OUT => '2'}],
+     ['bre40', '"ab" : "a\\{1,\\}b"', {OUT => '2'}],
+     ['bre41', '"aab" : "a\\{1,2\\}b"', {OUT => '3'}],
+     ['bre42', '_ : "a\\{1"',
+      {ERR => "$prog: Unmatched \\{\n"}, {EXIT => 2}],
+     ['bre43', '_ : "a\\{1a"',
+      {ERR => "$prog: Unmatched \\{\n"}, {EXIT => 2}],
+     ['bre44', '_ : "a\\{1a\\}"',
+      {ERR => "$prog: Invalid content of \\{\\}\n"}, {EXIT => 2}],
+     ['bre45', '"a" : "a\\{,2\\}"', {OUT => '1'}],
+     ['bre46', '"a" : "a\\{,\\}"', {OUT => '1'}],
+     ['bre47', '_ : "a\\{1,x\\}"',
+      {ERR => "$prog: Invalid content of \\{\\}\n"}, {EXIT => 2}],
+     ['bre48', '_ : "a\\{1,x"',
+      {ERR => "$prog: Unmatched \\{\n"}, {EXIT => 2}],
+     ['bre49', '_ : "a\\{32768\\}"',
+      {ERR => "$prog: Invalid content of \\{\\}\n"}, {EXIT => 2}],
+     ['bre50', '_ : "a\\{1,0\\}"',
+      {ERR => "$prog: Invalid content of \\{\\}\n"}, {EXIT => 2}],
+     ['bre51', '"acabc" : ".*ab\\{0,0\\}c"', {OUT => '2'}],
+     ['bre52', '"abcac" : "ab\\{0,1\\}c"', {OUT => '3'}],
+     ['bre53', '"abbcac" : "ab\\{0,3\\}c"', {OUT => '4'}],
+     ['bre54', '"abcac" : ".*ab\\{1,1\\}c"', {OUT => '3'}],
+     ['bre55', '"abcac" : ".*ab\\{1,3\\}c"', {OUT => '3'}],
+     ['bre56', '"abbcabc" : ".*ab\{2,2\}c"', {OUT => '4'}],
+     ['bre57', '"abbcabc" : ".*ab\{2,4\}c"', {OUT => '4'}],
+     ['bre58', '"aa" : "a\\{1\\}\\{1\\}"', {OUT => '1'}],
+     ['bre59', '"aa" : "a*\\{1\\}"', {OUT => '2'}],
+     ['bre60', '"aa" : "a\\{1\\}*"', {OUT => '2'}],
+     ['bre61', '"acd" : "a\\(b\\)?c\\1d"', {OUT => ''}, {EXIT => 1}],
+     ['bre62', '-- "-5" : "-\\{0,1\\}[0-9]*\$"', {OUT => '2'}],
+
      ['fail-b', '9 9', {ERR => "$prog: syntax error\n"},
       {EXIT => 2}],
      ['fail-c', {ERR => "$prog: missing operand\n"
Index: tests/misc/csplit
===================================================================
RCS file: /fetish/cu/tests/misc/csplit,v
retrieving revision 1.7
retrieving revision 1.8
diff -p -u -r1.7 -r1.8
--- tests/misc/csplit   10 Sep 2005 14:06:29 -0000      1.7
+++ tests/misc/csplit   12 Apr 2006 07:17:26 -0000      1.8
@@ -92,7 +92,7 @@ test $fail = 1 && diff err experr 2> /de
 # in 5.3.1.
 rm -f in out exp err experr xx??
 printf 'x%8199s\nx\n%8199s\nx\n' x x > in
-csplit in '/x/' '{*}' > /dev/null || fail=1
+csplit in '/x\{1\}/' '{*}' > /dev/null || fail=1
 cat xx?? | cmp - in || fail=1
 
 (exit $fail); exit $fail




reply via email to

[Prev in Thread] Current Thread [Next in Thread]