grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Changes to grep/manual/grep.html,v


From: Jim Meyering
Subject: Changes to grep/manual/grep.html,v
Date: Sun, 27 Sep 2020 23:36:50 -0400 (EDT)

CVSROOT:        /webcvs/grep
Module name:    grep
Changes by:     Jim Meyering <meyering> 20/09/27 23:36:49

Index: grep.html
===================================================================
RCS file: /webcvs/grep/grep/manual/grep.html,v
retrieving revision 1.30
retrieving revision 1.31
diff -u -b -r1.30 -r1.31
--- grep.html   2 Jan 2020 23:18:43 -0000       1.30
+++ grep.html   28 Sep 2020 03:36:48 -0000      1.31
@@ -14,10 +14,10 @@
 <!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ -->
 <head>
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
-<title>GNU Grep 3.4</title>
+<title>GNU Grep 3.5</title>
 
-<meta name="description" content="GNU Grep 3.4">
-<meta name="keywords" content="GNU Grep 3.4">
+<meta name="description" content="GNU Grep 3.5">
+<meta name="keywords" content="GNU Grep 3.5">
 <meta name="resource-type" content="document">
 <meta name="distribution" content="global">
 <meta name="Generator" content="makeinfo">
@@ -58,7 +58,7 @@
 </head>
 
 <body lang="en">
-<h1 class="settitle" align="center">GNU Grep 3.4</h1>
+<h1 class="settitle" align="center">GNU Grep 3.5</h1>
 
 
 
@@ -96,6 +96,8 @@
     <li><a name="toc-Anchoring-1" href="#Anchoring">3.4 Anchoring</a></li>
     <li><a name="toc-Back_002dreferences-and-Subexpressions-1" 
href="#Back_002dreferences-and-Subexpressions">3.5 Back-references and 
Subexpressions</a></li>
     <li><a name="toc-Basic-vs-Extended-Regular-Expressions" 
href="#Basic-vs-Extended">3.6 Basic vs Extended Regular Expressions</a></li>
+    <li><a name="toc-Character-Encoding-1" href="#Character-Encoding">3.7 
Character Encoding</a></li>
+    <li><a name="toc-Matching-Non_002dASCII-and-Non_002dprintable-Characters" 
href="#Matching-Non_002dASCII">3.8 Matching Non-ASCII and Non-printable 
Characters</a></li>
   </ul></li>
   <li><a name="toc-Usage-1" href="#Usage">4 Usage</a></li>
   <li><a name="toc-Performance-1" href="#Performance">5 Performance</a></li>
@@ -123,7 +125,7 @@
 
 <p><code>grep</code> prints lines that contain a match for one or more 
patterns.
 </p>
-<p>This manual is for version 3.4 of GNU Grep.
+<p>This manual is for version 3.5 of GNU Grep.
 </p>
 <p>This manual is for <code>grep</code>, a pattern matching engine.
 </p>
@@ -941,8 +943,11 @@
 <dt><samp>--line-buffered</samp></dt>
 <dd><a name="index-_002d_002dline_002dbuffered"></a>
 <a name="index-line-buffering"></a>
-<p>Use line buffering on output.
-This can cause a performance penalty.
+<p>Use line buffering for standard output, regardless of output device.
+By default, standard output is line buffered for interactive devices,
+and is fully buffered otherwise.  With full buffering, the output
+buffer is flushed when full; with line buffering, the buffer is also
+flushed after every output line.  The buffer size is system dependent.
 </p>
 </dd>
 <dt><samp>-U</samp></dt>
@@ -1237,21 +1242,8 @@
 <p>These variables specify the locale for the <code>LC_CTYPE</code> category,
 which determines the type of characters,
 e.g., which characters are whitespace.
-This category also determines the character encoding, that is, whether
-text is encoded in UTF-8, ASCII, or some other encoding.  In the
-&lsquo;<samp>C</samp>&rsquo; or &lsquo;<samp>POSIX</samp>&rsquo; locale, all 
characters are encoded as a
-single byte and every byte is a valid character.
-In more-complex encodings such as UTF-8, a sequence of multiple bytes
-may be needed to represent a character, and some bytes may be encoding
-errors that do not contribute to the representation of any character.
-POSIX does not specify the behavior of <code>grep</code> when patterns or
-input data contain encoding errors or null characters, so portable
-scripts should avoid such usage.  As an extension to POSIX, GNU
-<code>grep</code> treats null characters like any other character.
-However, unless the <samp>-a</samp> (<samp>--binary-files=text</samp>) option
-is used, the presence of null characters in input or of encoding
-errors in output causes GNU <code>grep</code> to treat the file as binary
-and suppress details about matches.  See <a 
href="#File-and-Directory-Selection">File and Directory Selection</a>.
+This category also determines the character encoding.
+See <a href="#Character-Encoding">Character Encoding</a>.
 </p>
 </dd>
 <dt><code>LANGUAGE</code></dt>
@@ -1314,9 +1306,6 @@
 
 <p>Normally the exit status is 0 if a line is selected, 1 if no lines
 were selected, and 2 if an error occurred.  However, if the
-<samp>-L</samp> or <samp>--files-without-match</samp> is used, the exit status
-is 0 if a file is listed, 1 if no files were listed, and 2 if an error
-occurred.  Also, if the
 <samp>-q</samp> or <samp>--quiet</samp> or <samp>--silent</samp> option is used
 and a line is selected, the exit status is 0 even if an error
 occurred.  Other <code>grep</code> implementations may exit with status
@@ -1434,6 +1423,10 @@
 </td></tr>
 <tr><td align="left" valign="top">&bull; <a href="#Basic-vs-Extended" 
accesskey="6">Basic vs Extended</a>:</td><td>&nbsp;&nbsp;</td><td align="left" 
valign="top">
 </td></tr>
+<tr><td align="left" valign="top">&bull; <a href="#Character-Encoding" 
accesskey="7">Character Encoding</a>:</td><td>&nbsp;&nbsp;</td><td align="left" 
valign="top">
+</td></tr>
+<tr><td align="left" valign="top">&bull; <a href="#Matching-Non_002dASCII" 
accesskey="8">Matching Non-ASCII</a>:</td><td>&nbsp;&nbsp;</td><td align="left" 
valign="top">
+</td></tr>
 </table>
 
 <hr>
@@ -1827,7 +1820,7 @@
 <a name="Basic-vs-Extended"></a>
 <div class="header">
 <p>
-Previous: <a href="#Back_002dreferences-and-Subexpressions" accesskey="p" 
rel="prev">Back-references and Subexpressions</a>, Up: <a 
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a> 
&nbsp; [<a href="#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
+Next: <a href="#Character-Encoding" accesskey="n" rel="next">Character 
Encoding</a>, Previous: <a href="#Back_002dreferences-and-Subexpressions" 
accesskey="p" rel="prev">Back-references and Subexpressions</a>, Up: <a 
href="#Regular-Expressions" accesskey="u" rel="up">Regular Expressions</a> 
&nbsp; [<a href="#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
 </div>
 <a name="Basic-vs-Extended-Regular-Expressions"></a>
 <h3 class="section">3.6 Basic vs Extended Regular Expressions</h3>
@@ -1853,7 +1846,83 @@
 POSIX allows this behavior as an extension, but portable scripts
 should avoid it.
 </p>
-
+<hr>
+<a name="Character-Encoding"></a>
+<div class="header">
+<p>
+Next: <a href="#Matching-Non_002dASCII" accesskey="n" rel="next">Matching 
Non-ASCII</a>, Previous: <a href="#Basic-vs-Extended" accesskey="p" 
rel="prev">Basic vs Extended</a>, Up: <a href="#Regular-Expressions" 
accesskey="u" rel="up">Regular Expressions</a> &nbsp; [<a href="#SEC_Contents" 
title="Table of contents" rel="contents">Contents</a>][<a href="#Index" 
title="Index" rel="index">Index</a>]</p>
+</div>
+<a name="Character-Encoding-1"></a>
+<h3 class="section">3.7 Character Encoding</h3>
+<a name="index-character-encoding"></a>
+
+<p>The <code>LC_CTYPE</code> locale specifies the encoding of characters in
+patterns and data, that is, whether text is encoded in UTF-8, ASCII,
+or some other encoding.  See <a href="#Environment-Variables">Environment 
Variables</a>.
+</p>
+<p>In the &lsquo;<samp>C</samp>&rsquo; or &lsquo;<samp>POSIX</samp>&rsquo; 
locale, every character is encoded as
+a single byte and every byte is a valid character.  In more-complex
+encodings such as UTF-8, a sequence of multiple bytes may be needed to
+represent a character, and some bytes may be encoding errors that do
+not contribute to the representation of any character.  POSIX does not
+specify the behavior of <code>grep</code> when patterns or input data
+contain encoding errors or null characters, so portable scripts should
+avoid such usage.  As an extension to POSIX, GNU <code>grep</code> treats
+null characters like any other character.  However, unless the
+<samp>-a</samp> (<samp>--binary-files=text</samp>) option is used, the
+presence of null characters in input or of encoding errors in output
+causes GNU <code>grep</code> to treat the file as binary and suppress
+details about matches.  See <a href="#File-and-Directory-Selection">File and 
Directory Selection</a>.
+</p>
+<p>Regardless of locale, the 103 characters in the POSIX Portable
+Character Set (a subset of ASCII) are always encoded as a single byte,
+and the 128 ASCII characters have their usual single-byte encodings on
+all but oddball platforms.
+</p>
+<hr>
+<a name="Matching-Non_002dASCII"></a>
+<div class="header">
+<p>
+Previous: <a href="#Character-Encoding" accesskey="p" rel="prev">Character 
Encoding</a>, Up: <a href="#Regular-Expressions" accesskey="u" rel="up">Regular 
Expressions</a> &nbsp; [<a href="#SEC_Contents" title="Table of contents" 
rel="contents">Contents</a>][<a href="#Index" title="Index" 
rel="index">Index</a>]</p>
+</div>
+<a name="Matching-Non_002dASCII-and-Non_002dprintable-Characters"></a>
+<h3 class="section">3.8 Matching Non-ASCII and Non-printable Characters</h3>
+<a name="index-non_002dASCII-matching"></a>
+<a name="index-non_002dprintable-matching"></a>
+
+<p>In a regular expression, non-ASCII and non-printable characters other
+than newline are not special, and represent themselves.  For example,
+in a locale using UTF-8 the command &lsquo;<samp>grep 
'Λ&nbsp;ω'</samp>&rsquo; (where the
+white space between &lsquo;<samp>Λ</samp>&rsquo; and the 
&lsquo;<samp>ω</samp>&rsquo; is a tab character)
+searches for &lsquo;<samp>Λ</samp>&rsquo; (Unicode character U+039B GREEK 
CAPITAL LETTER
+LAMBDA), followed by a tab (U+0009 TAB), followed by 
&lsquo;<samp>ω</samp>&rsquo; (U+03C9
+GREEK SMALL LETTER OMEGA).
+</p>
+<p>Suppose you want to limit your pattern to only printable characters
+(or even only printable ASCII characters) to keep your script readable
+or portable, but you also want to match specific non-ASCII or non-null
+non-printable characters.  If you are using the <samp>-P</samp>
+(<samp>--perl-regexp</samp>) option, PCREs give you several ways to do
+this.  Otherwise, if you are using Bash, the GNU project&rsquo;s shell, you
+can represent these characters via ANSI-C quoting.  For example, the
+Bash commands &lsquo;<samp>grep $'Λ\tω'</samp>&rsquo; and &lsquo;<samp>grep 
$'\u039B\t\u03C9'</samp>&rsquo;
+both search for the same three-character string 
&lsquo;<samp>Λ&nbsp;ω</samp>&rsquo;
+mentioned earlier.  However, because Bash translates ANSI-C quoting
+before <code>grep</code> sees the pattern, this technique should not be
+used to match printable ASCII characters; for example, &lsquo;<samp>grep
+$'\u005E'</samp>&rsquo; is equivalent to &lsquo;<samp>grep '^'</samp>&rsquo; 
and matches any line, not
+just lines containing the character &lsquo;<samp>^</samp>&rsquo; (U+005E 
CIRCUMFLEX
+ACCENT).
+</p>
+<p>Since PCREs and ANSI-C quoting are GNU extensions to POSIX, portable
+shell scripts written in ASCII should use other methods to match
+specific non-ASCII characters.  For example, in a UTF-8 locale the
+command &lsquo;<samp>grep &quot;$(printf 
'\316\233\t\317\211\n')&quot;</samp>&rsquo; is a portable
+albeit hard-to-read alternative to Bash&rsquo;s &lsquo;<samp>grep 
$'Λ\tω'</samp>&rsquo;.
+However, none of these techniques will let you put a null character
+directly into a command-line pattern; null characters can appear only
+in a pattern specified via the <samp>-f</samp> (<samp>--file</samp>) option.
+</p>
 <hr>
 <a name="Usage"></a>
 <div class="header">
@@ -2037,7 +2106,8 @@
 <samp>-a</samp> or &lsquo;<samp>--binary-files=text</samp>&rsquo; option.
 To eliminate the
 &ldquo;Binary file matches&rdquo; messages, use the <samp>-I</samp> or
-&lsquo;<samp>--binary-files=without-match</samp>&rsquo; option.
+&lsquo;<samp>--binary-files=without-match</samp>&rsquo; option,
+or the <samp>-s</samp> or <samp>--no-messages</samp> option.
 </p>
 </li><li> Why doesn&rsquo;t &lsquo;<samp>grep -lv</samp>&rsquo; print 
non-matching file names?
 
@@ -2254,9 +2324,15 @@
 back-references is at times unclear.  Furthermore, many regular
 expression implementations have back-reference bugs that can cause
 programs to return incorrect answers or even crash, and fixing these
-bugs has often been low-priority&mdash;for example, as of 2019 the GNU C
-library bug database contained back-reference bugs 52, 10844, 11053,
-and 25322, with little sign of forthcoming fixes.  Luckily,
+bugs has often been low-priority: for example, as of 2020 the
+<a href="https://sourceware.org/bugzilla/";>GNU C library bug database</a>
+contained back-reference bugs
+<a href="https://sourceware.org/bugzilla/show_bug.cgi?id=52";>52</a>,
+<a href="https://sourceware.org/bugzilla/show_bug.cgi?id=10844";>10844</a>,
+<a href="https://sourceware.org/bugzilla/show_bug.cgi?id=11053";>11053</a>,
+<a href="https://sourceware.org/bugzilla/show_bug.cgi?id=24269";>24269</a>
+and <a href="https://sourceware.org/bugzilla/show_bug.cgi?id=25322";>25322</a>,
+with little sign of forthcoming fixes.  Luckily,
 back-references are rarely useful and it should be little trouble to
 avoid them in practical applications.
 </p>
@@ -2693,7 +2769,7 @@
 of the GNU Free Documentation License from time to time.  Such new
 versions will be similar in spirit to the present version, but may
 differ in detail to address new problems or concerns.  See
-<a href="https://www.gnu.org/copyleft/";>https://www.gnu.org/copyleft/</a>.
+<a href="https://www.gnu.org/licenses/";>https://www.gnu.org/licenses/</a>.
 </p>
 <p>Each version of the License is given a distinguishing version number.
 If the Document specifies that a particular numbered version of this
@@ -2994,6 +3070,7 @@
 <tr><td></td><td valign="top"><a 
href="#index-changing-name-of-standard-input">changing name of standard 
input</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Output-Line-Prefix-Control">Output Line Prefix Control</a></td></tr>
 <tr><td></td><td valign="top"><a href="#index-character-class">character 
class</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket 
Expressions</a></td></tr>
 <tr><td></td><td valign="top"><a href="#index-character-classes">character 
classes</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket 
Expressions</a></td></tr>
+<tr><td></td><td valign="top"><a href="#index-character-encoding">character 
encoding</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Character-Encoding">Character Encoding</a></td></tr>
 <tr><td></td><td valign="top"><a href="#index-character-type">character 
type</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Environment-Variables">Environment Variables</a></td></tr>
 <tr><td></td><td valign="top"><a href="#index-classes-of-characters">classes 
of characters</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket 
Expressions</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-cntrl-character-class"><code>cntrl <span class="roman">character 
class</span></code></a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket 
Expressions</a></td></tr>
@@ -3102,6 +3179,8 @@
 <tr><td></td><td valign="top"><a 
href="#index-ne-GREP_005fCOLORS-capability"><code>ne GREP_COLORS <span 
class="roman">capability</span></code></a>:</td><td>&nbsp;</td><td 
valign="top"><a href="#Environment-Variables">Environment 
Variables</a></td></tr>
 <tr><td></td><td valign="top"><a 
href="#index-NLS">NLS</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Environment-Variables">Environment Variables</a></td></tr>
 <tr><td></td><td valign="top"><a href="#index-no-filename-prefix">no filename 
prefix</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Output-Line-Prefix-Control">Output Line Prefix Control</a></td></tr>
+<tr><td></td><td valign="top"><a 
href="#index-non_002dASCII-matching">non-ASCII 
matching</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Matching-Non_002dASCII">Matching Non-ASCII</a></td></tr>
+<tr><td></td><td valign="top"><a 
href="#index-non_002dprintable-matching">non-printable 
matching</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Matching-Non_002dASCII">Matching Non-ASCII</a></td></tr>
 <tr><td></td><td valign="top"><a href="#index-null-character">null 
character</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Environment-Variables">Environment Variables</a></td></tr>
 <tr><td></td><td valign="top"><a href="#index-numeric-characters">numeric 
characters</a>:</td><td>&nbsp;</td><td valign="top"><a 
href="#Character-Classes-and-Bracket-Expressions">Character Classes and Bracket 
Expressions</a></td></tr>
 <tr><td colspan="4"> <hr></td></tr>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]