[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Changes to grep/manual/grep.html,v
From: |
Jim Meyering |
Subject: |
Changes to grep/manual/grep.html,v |
Date: |
Sun, 27 Sep 2020 23:36:50 -0400 (EDT) |
CVSROOT: /webcvs/grep
Module name: grep
Changes by: Jim Meyering <meyering> 20/09/27 23:36:49
Index: grep.html
===================================================================
RCS file: /webcvs/grep/grep/manual/grep.html,v
retrieving revision 1.30
retrieving revision 1.31
diff -u -b -r1.30 -r1.31
--- grep.html 2 Jan 2020 23:18:43 -0000 1.30
+++ grep.html 28 Sep 2020 03:36:48 -0000 1.31
@@ -14,10 +14,10 @@
<!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
-<title>GNU Grep 3.4</title>
+<title>GNU Grep 3.5</title>
-<meta name="description" content="GNU Grep 3.4">
-<meta name="keywords" content="GNU Grep 3.4">
+<meta name="description" content="GNU Grep 3.5">
+<meta name="keywords" content="GNU Grep 3.5">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
@@ -58,7 +58,7 @@
</head>
<body lang="en">
-<h1 class="settitle" align="center">GNU Grep 3.4</h1>
+<h1 class="settitle" align="center">GNU Grep 3.5</h1>
@@ -96,6 +96,8 @@
<li><a name="toc-Anchoring-1" href="#Anchoring">3.4 Anchoring</a></li>
<li><a name="toc-Back_002dreferences-and-Subexpressions-1"
href="#Back_002dreferences-and-Subexpressions">3.5 Back-references and
Subexpressions</a></li>
<li><a name="toc-Basic-vs-Extended-Regular-Expressions"
href="#Basic-vs-Extended">3.6 Basic vs Extended Regular Expressions</a></li>
+ <li><a name="toc-Character-Encoding-1" href="#Character-Encoding">3.7
Character Encoding</a></li>
+ <li><a name="toc-Matching-Non_002dASCII-and-Non_002dprintable-Characters"
href="#Matching-Non_002dASCII">3.8 Matching Non-ASCII and Non-printable
Characters</a></li>
</ul></li>
<li><a name="toc-Usage-1" href="#Usage">4 Usage</a></li>
<li><a name="toc-Performance-1" href="#Performance">5 Performance</a></li>
@@ -123,7 +125,7 @@
<p><code>grep</code> prints lines that contain a match for one or more
patterns.
</p>
-<p>This manual is for version 3.4 of GNU Grep.
+<p>This manual is for version 3.5 of GNU Grep.
</p>
<p>This manual is for <code>grep</code>, a pattern matching engine.
</p>
@@ -941,8 +943,11 @@
<dt><samp>--line-buffered</samp></dt>
<dd><a name="index-_002d_002dline_002dbuffered"></a>
<a name="index-line-buffering"></a>
-<p>Use line buffering on output.
-This can cause a performance penalty.
+<p>Use line buffering for standard output, regardless of output device.
+By default, standard output is line buffered for interactive devices,
+and is fully buffered otherwise. With full buffering, the output
+buffer is flushed when full; with line buffering, the buffer is also
+flushed after every output line. The buffer size is system dependent.
</p>
</dd>
<dt><samp>-U</samp></dt>
@@ -1237,21 +1242,8 @@
<p>These variables specify the locale for the <code>LC_CTYPE</code> category,
which determines the type of characters,
e.g., which characters are whitespace.
-This category also determines the character encoding, that is, whether
-text is encoded in UTF-8, ASCII, or some other encoding. In the
-‘<samp>C</samp>’ or ‘<samp>POSIX</samp>’ locale, all
characters are encoded as a
-single byte and every byte is a valid character.
-In more-complex encodings such as UTF-8, a sequence of multiple bytes
-may be needed to represent a character, and some bytes may be encoding
-errors that do not contribute to the representation of any character.
-POSIX does not specify the behavior of <code>grep</code> when patterns or
-input data contain encoding errors or null characters, so portable
-scripts should avoid such usage. As an extension to POSIX, GNU
-<code>grep</code> treats null characters like any other character.
-However, unless the <samp>-a</samp> (<samp>--binary-files=text</samp>) option
-is used, the presence of null characters in input or of encoding
-errors in output causes GNU <code>grep</code> to treat the file as binary
-and suppress details about matches. See <a
href="#File-and-Directory-Selection">File and Directory Selection</a>.
+This category also determines the character encoding.
+See <a href="#Character-Encoding">Character Encoding</a>.
</p>
</dd>
<dt><code>LANGUAGE</code></dt>
@@ -1314,9 +1306,6 @@
<p>Normally the exit status is 0 if a line is selected, 1 if no lines
were selected, and 2 if an error occurred. However, if the
-<samp>-L</samp> or <samp>--files-without-match</samp> is used, the exit status
-is 0 if a file is listed, 1 if no files were listed, and 2 if an error
-occurred. Also, if the
<samp>-q</samp> or <samp>--quiet</samp> or <samp>--silent</samp> option is used
and a line is selected, the exit status is 0 even if an error
occurred. Other <code>grep</code> implementations may exit with status
@@ -1434,6 +1423,10 @@
</td></tr>
<tr><td align="left" valign="top">• <a href="#Basic-vs-Extended"
accesskey="6">Basic vs Extended</a>:</td><td> </td><td align="left"
valign="top">
</td></tr>
+<tr><td align="left" valign="top">• <a href="#Character-Encoding"
accesskey="7">Character Encoding</a>:</td><td> </td><td align="left"
valign="top">
+</td></tr>
+<tr><td align="left" valign="top">• <a href="#Matching-Non_002dASCII"
accesskey="8">Matching Non-ASCII</a>:</td><td> </td><td align="left"
valign="top">
+</td></tr>
</table>
<hr>
@@ -1827,7 +1820,7 @@
<a name="Basic-vs-Extended"></a>
<div class="header">
<p>
-Previous: <a href="#Back_002dreferences-and-Subexpressions" accesskey="p"
rel="prev">Back-references and Subexpressions</a>, Up: <a
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a>
[<a href="#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
+Next: <a href="#Character-Encoding" accesskey="n" rel="next">Character
Encoding</a>, Previous: <a href="#Back_002dreferences-and-Subexpressions"
accesskey="p" rel="prev">Back-references and Subexpressions</a>, Up: <a
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a>
[<a href="#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
</div>
<a name="Basic-vs-Extended-Regular-Expressions"></a>
<h3 class="section">3.6 Basic vs Extended Regular Expressions</h3>
@@ -1853,7 +1846,83 @@
POSIX allows this behavior as an extension, but portable scripts
should avoid it.
</p>
-
+<hr>
+<a name="Character-Encoding"></a>
+<div class="header">
+<p>
+Next: <a href="#Matching-Non_002dASCII" accesskey="n" rel="next">Matching
Non-ASCII</a>, Previous: <a href="#Basic-vs-Extended" accesskey="p"
rel="prev">Basic vs Extended</a>, Up: <a href="#Regular-Expressions"
accesskey="u" rel="up">Regular Expressions</a> [<a href="#SEC_Contents"
title="Table of contents" rel="contents">Contents</a>][<a href="#Index"
title="Index" rel="index">Index</a>]</p>
+</div>
+<a name="Character-Encoding-1"></a>
+<h3 class="section">3.7 Character Encoding</h3>
+<a name="index-character-encoding"></a>
+
+<p>The <code>LC_CTYPE</code> locale specifies the encoding of characters in
+patterns and data, that is, whether text is encoded in UTF-8, ASCII,
+or some other encoding. See <a href="#Environment-Variables">Environment
Variables</a>.
+</p>
+<p>In the ‘<samp>C</samp>’ or ‘<samp>POSIX</samp>’
locale, every character is encoded as
+a single byte and every byte is a valid character. In more-complex
+encodings such as UTF-8, a sequence of multiple bytes may be needed to
+represent a character, and some bytes may be encoding errors that do
+not contribute to the representation of any character. POSIX does not
+specify the behavior of <code>grep</code> when patterns or input data
+contain encoding errors or null characters, so portable scripts should
+avoid such usage. As an extension to POSIX, GNU <code>grep</code> treats
+null characters like any other character. However, unless the
+<samp>-a</samp> (<samp>--binary-files=text</samp>) option is used, the
+presence of null characters in input or of encoding errors in output
+causes GNU <code>grep</code> to treat the file as binary and suppress
+details about matches. See <a href="#File-and-Directory-Selection">File and
Directory Selection</a>.
+</p>
+<p>Regardless of locale, the 103 characters in the POSIX Portable
+Character Set (a subset of ASCII) are always encoded as a single byte,
+and the 128 ASCII characters have their usual single-byte encodings on
+all but oddball platforms.
+</p>
+<hr>
+<a name="Matching-Non_002dASCII"></a>
+<div class="header">
+<p>
+Previous: <a href="#Character-Encoding" accesskey="p" rel="prev">Character
Encoding</a>, Up: <a href="#Regular-Expressions" accesskey="u" rel="up">Regular
Expressions</a> [<a href="#SEC_Contents" title="Table of contents"
rel="contents">Contents</a>][<a href="#Index" title="Index"
rel="index">Index</a>]</p>
+</div>
+<a name="Matching-Non_002dASCII-and-Non_002dprintable-Characters"></a>
+<h3 class="section">3.8 Matching Non-ASCII and Non-printable Characters</h3>
+<a name="index-non_002dASCII-matching"></a>
+<a name="index-non_002dprintable-matching"></a>
+
+<p>In a regular expression, non-ASCII and non-printable characters other
+than newline are not special, and represent themselves. For example,
+in a locale using UTF-8 the command ‘<samp>grep
'Î Ï'</samp>’ (where the
+white space between ‘<samp>Î</samp>’ and the
‘<samp>Ï</samp>’ is a tab character)
+searches for ‘<samp>Î</samp>’ (Unicode character U+039B GREEK
CAPITAL LETTER
+LAMBDA), followed by a tab (U+0009 TAB), followed by
‘<samp>Ï</samp>’ (U+03C9
+GREEK SMALL LETTER OMEGA).
+</p>
+<p>Suppose you want to limit your pattern to only printable characters
+(or even only printable ASCII characters) to keep your script readable
+or portable, but you also want to match specific non-ASCII or non-null
+non-printable characters. If you are using the <samp>-P</samp>
+(<samp>--perl-regexp</samp>) option, PCREs give you several ways to do
+this. Otherwise, if you are using Bash, the GNU project’s shell, you
+can represent these characters via ANSI-C quoting. For example, the
+Bash commands ‘<samp>grep $'Î\tÏ'</samp>’ and ‘<samp>grep
$'\u039B\t\u03C9'</samp>’
+both search for the same three-character string
‘<samp>Î Ï</samp>’
+mentioned earlier. However, because Bash translates ANSI-C quoting
+before <code>grep</code> sees the pattern, this technique should not be
+used to match printable ASCII characters; for example, ‘<samp>grep
+$'\u005E'</samp>’ is equivalent to ‘<samp>grep '^'</samp>’
and matches any line, not
+just lines containing the character ‘<samp>^</samp>’ (U+005E
CIRCUMFLEX
+ACCENT).
+</p>
+<p>Since PCREs and ANSI-C quoting are GNU extensions to POSIX, portable
+shell scripts written in ASCII should use other methods to match
+specific non-ASCII characters. For example, in a UTF-8 locale the
+command ‘<samp>grep "$(printf
'\316\233\t\317\211\n')"</samp>’ is a portable
+albeit hard-to-read alternative to Bash’s ‘<samp>grep
$'Î\tÏ'</samp>’.
+However, none of these techniques will let you put a null character
+directly into a command-line pattern; null characters can appear only
+in a pattern specified via the <samp>-f</samp> (<samp>--file</samp>) option.
+</p>
<hr>
<a name="Usage"></a>
<div class="header">
@@ -2037,7 +2106,8 @@
<samp>-a</samp> or ‘<samp>--binary-files=text</samp>’ option.
To eliminate the
“Binary file matches” messages, use the <samp>-I</samp> or
-‘<samp>--binary-files=without-match</samp>’ option.
+‘<samp>--binary-files=without-match</samp>’ option,
+or the <samp>-s</samp> or <samp>--no-messages</samp> option.
</p>
</li><li> Why doesn’t ‘<samp>grep -lv</samp>’ print
non-matching file names?
@@ -2254,9 +2324,15 @@
back-references is at times unclear. Furthermore, many regular
expression implementations have back-reference bugs that can cause
programs to return incorrect answers or even crash, and fixing these
-bugs has often been low-priority—for example, as of 2019 the GNU C
-library bug database contained back-reference bugs 52, 10844, 11053,
-and 25322, with little sign of forthcoming fixes. Luckily,
+bugs has often been low-priority: for example, as of 2020 the
+<a href="https://sourceware.org/bugzilla/">GNU C library bug database</a>
+contained back-reference bugs
+<a href="https://sourceware.org/bugzilla/show_bug.cgi?id=52">52</a>,
+<a href="https://sourceware.org/bugzilla/show_bug.cgi?id=10844">10844</a>,
+<a href="https://sourceware.org/bugzilla/show_bug.cgi?id=11053">11053</a>,
+<a href="https://sourceware.org/bugzilla/show_bug.cgi?id=24269">24269</a>
+and <a href="https://sourceware.org/bugzilla/show_bug.cgi?id=25322">25322</a>,
+with little sign of forthcoming fixes. Luckily,
back-references are rarely useful and it should be little trouble to
avoid them in practical applications.
</p>
@@ -2693,7 +2769,7 @@
of the GNU Free Documentation License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns. See
-<a href="https://www.gnu.org/copyleft/">https://www.gnu.org/copyleft/</a>.
+<a href="https://www.gnu.org/licenses/">https://www.gnu.org/licenses/</a>.
</p>
<p>Each version of the License is given a distinguishing version number.
If the Document specifies that a particular numbered version of this
@@ -2994,6 +3070,7 @@
<tr><td></td><td valign="top"><a
href="#index-changing-name-of-standard-input">changing name of standard
input</a>:</td><td> </td><td valign="top"><a
href="#Output-Line-Prefix-Control">Output Line Prefix Control</a></td></tr>
<tr><td></td><td valign="top"><a href="#index-character-class">character
class</a>:</td><td> </td><td valign="top"><a
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket
Expressions</a></td></tr>
<tr><td></td><td valign="top"><a href="#index-character-classes">character
classes</a>:</td><td> </td><td valign="top"><a
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket
Expressions</a></td></tr>
+<tr><td></td><td valign="top"><a href="#index-character-encoding">character
encoding</a>:</td><td> </td><td valign="top"><a
href="#Character-Encoding">Character Encoding</a></td></tr>
<tr><td></td><td valign="top"><a href="#index-character-type">character
type</a>:</td><td> </td><td valign="top"><a
href="#Environment-Variables">Environment Variables</a></td></tr>
<tr><td></td><td valign="top"><a href="#index-classes-of-characters">classes
of characters</a>:</td><td> </td><td valign="top"><a
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket
Expressions</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-cntrl-character-class"><code>cntrl <span class="roman">character
class</span></code></a>:</td><td> </td><td valign="top"><a
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket
Expressions</a></td></tr>
@@ -3102,6 +3179,8 @@
<tr><td></td><td valign="top"><a
href="#index-ne-GREP_005fCOLORS-capability"><code>ne GREP_COLORS <span
class="roman">capability</span></code></a>:</td><td> </td><td
valign="top"><a href="#Environment-Variables">Environment
Variables</a></td></tr>
<tr><td></td><td valign="top"><a
href="#index-NLS">NLS</a>:</td><td> </td><td valign="top"><a
href="#Environment-Variables">Environment Variables</a></td></tr>
<tr><td></td><td valign="top"><a href="#index-no-filename-prefix">no filename
prefix</a>:</td><td> </td><td valign="top"><a
href="#Output-Line-Prefix-Control">Output Line Prefix Control</a></td></tr>
+<tr><td></td><td valign="top"><a
href="#index-non_002dASCII-matching">non-ASCII
matching</a>:</td><td> </td><td valign="top"><a
href="#Matching-Non_002dASCII">Matching Non-ASCII</a></td></tr>
+<tr><td></td><td valign="top"><a
href="#index-non_002dprintable-matching">non-printable
matching</a>:</td><td> </td><td valign="top"><a
href="#Matching-Non_002dASCII">Matching Non-ASCII</a></td></tr>
<tr><td></td><td valign="top"><a href="#index-null-character">null
character</a>:</td><td> </td><td valign="top"><a
href="#Environment-Variables">Environment Variables</a></td></tr>
<tr><td></td><td valign="top"><a href="#index-numeric-characters">numeric
characters</a>:</td><td> </td><td valign="top"><a
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket
Expressions</a></td></tr>
<tr><td colspan="4"> <hr></td></tr>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Changes to grep/manual/grep.html,v,
Jim Meyering <=