grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grep branch, master, updated. v3.1-24-ge552346


From: Paul Eggert
Subject: grep branch, master, updated. v3.1-24-ge552346
Date: Fri, 20 Apr 2018 18:19:40 -0400 (EDT)

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".

The branch, master has been updated
       via  e552346b1a4a1427c0f52aadde1292a7258314a0 (commit)
      from  fbd9f06ce1a3b9203378881e59184d3caa1c3c4c (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=e552346b1a4a1427c0f52aadde1292a7258314a0


commit e552346b1a4a1427c0f52aadde1292a7258314a0
Author: Paul Eggert <address@hidden>
Date:   Fri Apr 20 15:19:09 2018 -0700

    doc: mention encoding errors
    
    This attempts to document the encoding-error problem more
    precisely (Bug#30326).
    * doc/grep.in.1, doc/grep.texi: Mention that the behavior of
    patterns like ‘.’ is not specified on encoding errors.

diff --git a/doc/grep.in.1 b/doc/grep.in.1
index 9393b37..ae14e54 100644
--- a/doc/grep.in.1
+++ b/doc/grep.in.1
@@ -744,6 +744,7 @@ may be quoted by preceding it with a backslash.
 The period
 .B .\&
 matches any single character.
+It is unspecified whether it matches an encoding error.
 .SS "Character Classes and Bracket Expressions"
 A
 .I "bracket expression"
@@ -752,12 +753,13 @@ is a list of characters enclosed by
 and
 .BR ] .
 It matches any single
-character in that list; if the first character of the list
+character in that list.
+If the first character of the list
 is the caret
 .B ^
 then it matches any character
 .I not
-in the list.
+in the list; it is unspecified whether it matches an encoding error.
 For example, the regular expression
 .B [0123456789]
 matches any single digit.
diff --git a/doc/grep.texi b/doc/grep.texi
index 922d96e..58caa62 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1016,6 +1016,8 @@ interpreted.
 @vindex LC_ALL @r{environment variable}
 @vindex LC_CTYPE @r{environment variable}
 @vindex LANG @r{environment variable}
address@hidden encoding error
address@hidden null character
 These variables specify the locale for the @env{LC_CTYPE} category,
 which determines the type of characters,
 e.g., which characters are whitespace.
@@ -1023,6 +1025,18 @@ This category also determines the character encoding, 
that is, whether
 text is encoded in UTF-8, ASCII, or some other encoding.  In the
 @samp{C} or @samp{POSIX} locale, all characters are encoded as a
 single byte and every byte is a valid character.
+In more-complex encodings such as UTF-8, a sequence of multiple bytes
+may be needed to represent a character, and some bytes may be encoding
+errors that do not contribute to the representation of any character.
+POSIX does not specify the behavior of @command{grep} when patterns or
+input data contain encoding errors or null characters, so portable
+scripts should avoid such usage.  As an extension to POSIX, GNU
address@hidden treats null characters like any other character.
+However, unless the @option{-a} (@option{--binary-files=text}) option
+is used, the presence of null characters in input or of encoding
+errors in output causes GNU @command{grep} to treat the file as binary
+and suppress details about matches.  @xref{File and Directory
+Selection}.
 
 @item LANGUAGE
 @itemx LC_ALL
@@ -1187,16 +1201,16 @@ are regular expressions that match themselves.
 Any meta-character
 with special meaning may be quoted by preceding it with a backslash.
 
-A regular expression may be followed by one of several
-repetition operators:
-
address@hidden @samp
-
address@hidden .
 @opindex .
 @cindex dot
 @cindex period
 The period @samp{.} matches any single character.
+It is unspecified whether @samp{.} matches an encoding error.
+
+A regular expression may be followed by one of several
+repetition operators:
+
address@hidden @samp
 
 @item ?
 @opindex ?
@@ -1267,11 +1281,15 @@ An unmatched @samp{)} matches just itself.
 @cindex character class
 A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and
 @samp{]}.
-It matches any single character in that list;
-if the first character of the list is the caret @samp{^},
-then it matches any character @strong{not} in the list.
+It matches any single character in that list.
+If the first character of the list is the caret @samp{^},
+then it matches any character @strong{not} in the list,
+and it is unspecified whether it matches an encoding error.
 For example, the regular expression
address@hidden matches any single digit.
address@hidden matches any single digit,
+whereas @samp{[^()]} matches any single character that is not
+an opening or closing parenthesis, and might or might not match an
+encoding error.
 
 @cindex range expression
 Within a bracket expression, a @dfn{range expression} consists of two
@@ -1856,7 +1874,7 @@ On some operating systems that support files with 
holes---large
 regions of zeros that are not physically present on secondary
 address@hidden can skip over the holes efficiently without
 needing to read the zeros.  This optimization is not available if the
address@hidden (@option{--text}) option is used (@pxref{File and
address@hidden (@option{--binary-files=text}) option is used (@pxref{File and
 Directory Selection}), unless the @option{-z} (@option{--null-data})
 option is also used (@pxref{Other Options}).
 

-----------------------------------------------------------------------

Summary of changes:
 doc/grep.in.1 |  6 ++++--
 doc/grep.texi | 40 +++++++++++++++++++++++++++++-----------
 2 files changed, 33 insertions(+), 13 deletions(-)


hooks/post-receive
-- 
grep



reply via email to

[Prev in Thread] Current Thread [Next in Thread]