[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
grep branch, master, updated. v3.1-24-ge552346
From: |
Paul Eggert |
Subject: |
grep branch, master, updated. v3.1-24-ge552346 |
Date: |
Fri, 20 Apr 2018 18:19:40 -0400 (EDT) |
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".
The branch, master has been updated
via e552346b1a4a1427c0f52aadde1292a7258314a0 (commit)
from fbd9f06ce1a3b9203378881e59184d3caa1c3c4c (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=e552346b1a4a1427c0f52aadde1292a7258314a0
commit e552346b1a4a1427c0f52aadde1292a7258314a0
Author: Paul Eggert <address@hidden>
Date: Fri Apr 20 15:19:09 2018 -0700
doc: mention encoding errors
This attempts to document the encoding-error problem more
precisely (Bug#30326).
* doc/grep.in.1, doc/grep.texi: Mention that the behavior of
patterns like â.â is not specified on encoding errors.
diff --git a/doc/grep.in.1 b/doc/grep.in.1
index 9393b37..ae14e54 100644
--- a/doc/grep.in.1
+++ b/doc/grep.in.1
@@ -744,6 +744,7 @@ may be quoted by preceding it with a backslash.
The period
.B .\&
matches any single character.
+It is unspecified whether it matches an encoding error.
.SS "Character Classes and Bracket Expressions"
A
.I "bracket expression"
@@ -752,12 +753,13 @@ is a list of characters enclosed by
and
.BR ] .
It matches any single
-character in that list; if the first character of the list
+character in that list.
+If the first character of the list
is the caret
.B ^
then it matches any character
.I not
-in the list.
+in the list; it is unspecified whether it matches an encoding error.
For example, the regular expression
.B [0123456789]
matches any single digit.
diff --git a/doc/grep.texi b/doc/grep.texi
index 922d96e..58caa62 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1016,6 +1016,8 @@ interpreted.
@vindex LC_ALL @r{environment variable}
@vindex LC_CTYPE @r{environment variable}
@vindex LANG @r{environment variable}
address@hidden encoding error
address@hidden null character
These variables specify the locale for the @env{LC_CTYPE} category,
which determines the type of characters,
e.g., which characters are whitespace.
@@ -1023,6 +1025,18 @@ This category also determines the character encoding,
that is, whether
text is encoded in UTF-8, ASCII, or some other encoding. In the
@samp{C} or @samp{POSIX} locale, all characters are encoded as a
single byte and every byte is a valid character.
+In more-complex encodings such as UTF-8, a sequence of multiple bytes
+may be needed to represent a character, and some bytes may be encoding
+errors that do not contribute to the representation of any character.
+POSIX does not specify the behavior of @command{grep} when patterns or
+input data contain encoding errors or null characters, so portable
+scripts should avoid such usage. As an extension to POSIX, GNU
address@hidden treats null characters like any other character.
+However, unless the @option{-a} (@option{--binary-files=text}) option
+is used, the presence of null characters in input or of encoding
+errors in output causes GNU @command{grep} to treat the file as binary
+and suppress details about matches. @xref{File and Directory
+Selection}.
@item LANGUAGE
@itemx LC_ALL
@@ -1187,16 +1201,16 @@ are regular expressions that match themselves.
Any meta-character
with special meaning may be quoted by preceding it with a backslash.
-A regular expression may be followed by one of several
-repetition operators:
-
address@hidden @samp
-
address@hidden .
@opindex .
@cindex dot
@cindex period
The period @samp{.} matches any single character.
+It is unspecified whether @samp{.} matches an encoding error.
+
+A regular expression may be followed by one of several
+repetition operators:
+
address@hidden @samp
@item ?
@opindex ?
@@ -1267,11 +1281,15 @@ An unmatched @samp{)} matches just itself.
@cindex character class
A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and
@samp{]}.
-It matches any single character in that list;
-if the first character of the list is the caret @samp{^},
-then it matches any character @strong{not} in the list.
+It matches any single character in that list.
+If the first character of the list is the caret @samp{^},
+then it matches any character @strong{not} in the list,
+and it is unspecified whether it matches an encoding error.
For example, the regular expression
address@hidden matches any single digit.
address@hidden matches any single digit,
+whereas @samp{[^()]} matches any single character that is not
+an opening or closing parenthesis, and might or might not match an
+encoding error.
@cindex range expression
Within a bracket expression, a @dfn{range expression} consists of two
@@ -1856,7 +1874,7 @@ On some operating systems that support files with
holes---large
regions of zeros that are not physically present on secondary
address@hidden can skip over the holes efficiently without
needing to read the zeros. This optimization is not available if the
address@hidden (@option{--text}) option is used (@pxref{File and
address@hidden (@option{--binary-files=text}) option is used (@pxref{File and
Directory Selection}), unless the @option{-z} (@option{--null-data})
option is also used (@pxref{Other Options}).
-----------------------------------------------------------------------
Summary of changes:
doc/grep.in.1 | 6 ++++--
doc/grep.texi | 40 +++++++++++++++++++++++++++++-----------
2 files changed, 33 insertions(+), 13 deletions(-)
hooks/post-receive
--
grep
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- grep branch, master, updated. v3.1-24-ge552346,
Paul Eggert <=