grep branch, master, updated. v3.0-3-ge1ca01b

grep-commit
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
grep branch, master, updated. v3.0-3-ge1ca01b

From:	Paul Eggert
Subject:	grep branch, master, updated. v3.0-3-ge1ca01b
Date:	Mon, 13 Feb 2017 19:38:44 -0500 (EST)
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".

The branch, master has been updated
       via  e1ca01be48cb64e5eaa6b5b29910e7eea1719f91 (commit)
      from  96e100ad23ec85bf602064298bf86b22cb358525 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=e1ca01be48cb64e5eaa6b5b29910e7eea1719f91


commit e1ca01be48cb64e5eaa6b5b29910e7eea1719f91
Author: Paul Eggert <address@hidden>
Date:   Mon Feb 13 16:37:43 2017 -0800

    Update TODO and doc
    
    * TODO: Bring up-to-date and fix formatting glitches.
    * doc/grep.in.1, doc/grep.texi: Fix minor glitches.
    The above patches should address the same problems that recent
    Debian doc patches address, albeit in a different way.

diff --git a/TODO b/TODO
index 55a8262..cb953b3 100644
--- a/TODO
+++ b/TODO
@@ -1,3 +1,5 @@
+Things to do for GNU grep
+
   Copyright (C) 1992, 1997-2002, 2004-2017 Free Software Foundation, Inc.
 
   Copying and distribution of this file, with or without modification,
@@ -10,58 +12,57 @@ Short term work
 
 See where we are with UTF-8 performance.
 
-Merge Debian patches 55-bigfile.patch, 69-mbtowc.patch and
-70-man_apostrophe.patch.  Go through patches in Savannah.
+Merge Debian patches that seem relevant.
+
+Go through patches in Savannah.
 
-Cleanup of the grep(), grepdir(), recursion (the "main loop") to use fts.
 Fix --directories=read.
 
 Write better Texinfo documentation for grep.  The manual page would be a
 good place to start, but Info documents are also supposed to contain a
 tutorial and examples.
 
-Some test in tests/spencer2.tests should have failed!  Need to filter out
+Some tests in tests/spencer2.tests should have failed!  Need to filter out
 some bugs in dfa.[ch]/regex.[ch].
 
 Multithreading?
 
-GNU grep does 32-bit arithmetic, it needs to move to 64-bit (i.e.
-size_t/ptrdiff_t).
+GNU grep originally did 32-bit arithmetic.  Although it has moved to
+64-bit on 64-bit platforms by using types like ptrdiff_t and size_t,
+this conversion has not been entirely systematic and should be checked.
 
-Lazy dynamic linking of libpcre.
+Lazy dynamic linking of libpcre.  See Debianâs 03-397262-dlopen-pcre.patch.
 
-Check FreeBSD's integration of zgrep (-Z) and bzgrep (-J) in one
-binary. Is there a possibility of doing even better by automatically
+Check FreeBSDâs integration of zgrep (-Z) and bzgrep (-J) in one
+binary.  Is there a possibility of doing even better by automatically
 checking the magic of binary files ourselves (0x1F 0x8B for gzip, 0x1F
 0x9D for compress, and 0x42 0x5A 0x68 for bzip2)?  Once what to do with
 libpcre is decided, do the same for libz and libbz2.
 
 
-==================
+===================
 Matching algorithms
-==================
+===================
 
-Check <http://tony.abou-assaleh.net/greps.html>.  Take a look at these
-and consider opportunities for merging or cloning:
+Take a look at these and consider opportunities for merging or cloning:
 
-   -- ja-grep's mlb2 patch (Japanese grep)
-  <ftp://ftp.freebsd.org/pub/FreeBSD/ports/distfiles/grep-2.4.2-mlb2.patch.gz>
+   -- 
http://osrd.org/projects/grep/global-regular-expression-print-tools-grep-variants
+   -- ja-grepâs mlb2 patch (Japanese grep)
+      <http://distcache.freebsd.org/ports-distfiles/grep-2.4.2-mlb2.patch.gz>
    -- lgrep (from lv, a Powerful Multilingual File Viewer / Grep)
-      <http://www.ff.iij4u.or.jp/~nrt/lv/>;
-   -- cgrep (Context grep) <http://plg.uwaterloo.ca/~ftp/mt/cgrep/>
+      <http://www.mt.cs.keio.ac.jp/person/narita/lv/>;
+   -- cgrep (Context grep) <https://awgn.github.io/cgrep/>
       seems like nice work;
-   -- sgrep (Struct grep) <http://www.cs.helsinki.fi/u/jjaakkol/sgrep.html>;
-   -- agrep (Approximate grep) <http://www.tgries.de/agrep/>,
+   -- sgrep (Struct grep) <https://www.cs.helsinki.fi/u/jjaakkol/sgrep.html>;
+   -- agrep (Approximate grep) <https://www.tgries.de/agrep/>,
       from glimpse;
    -- nr-grep (Nondeterministic reverse grep)
-      <http://www.dcc.uchile.cl/~gnavarro/software/>;
+      <https://www.dcc.uchile.cl/~gnavarro/software/>;
    -- ggrep (Grouse grep) <http://www.grouse.com.au/ggrep/>;
-   -- grep.py (Python grep) <http://www.vdesmedt.com/~vds2212/grep.html>;
-   -- freegrep <http://www.vocito.com/downloads/software/grep/>;
+   -- freegrep <https://github.com/howardjp/freegrep>;
 
-Check some new algorithms for matching; talk to Karl Berry and Nelson.
-Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142)
-claim that his algorithm is faster than Boyer-More. Worth checking.
+Check some new algorithms for matching.  See, for example, Faro &
+Lecroq (cited in kwset.c).
 
 Fix the DFA matcher to never use exponential space.  (Fortunately, these
 cases are rare.)
@@ -71,15 +72,20 @@ cases are rare.)
 Standards: POSIX and Unicode
 ============================
 
-For POSIX compliance, see p10003.x. Current support for the POSIX [= =]
-and [. .] constructs is limited. This is difficult because it requires
-locale-dependent details of the character set and collating sequence,
-but POSIX does not standardize any method for accessing this information!
+For POSIX compliance issues, see POSIX 1003.1.
+
+Current support for the POSIX [= =] and [. .] constructs is limited to
+platforms whose regular expression matchers are sufficiently
+compatible with the GNU C library so that the --without-included-regex
+option of âconfigureâ is in effect.  Extend this support to non-glibc
+platforms, where --with-included-regex is in effect, by modifying the
+included version of the regex code to defer to the native version when
+handling [= =] and [. .].
 
 For Unicode, interesting things to check include the Unicode Standard
 <http://www.unicode.org/standard/standard.html> and the Unicode Technical
 Standard #18 (<http://www.unicode.org/reports/tr18/> âUnicode Regular
-Expressionsâ).  Talk to Bruno Haible who's maintaining GNU libunistring.
+Expressionsâ).  Talk to Bruno Haible whoâs maintaining GNU libunistring.
 See also Unicode Standard Annex #15 (<http://www.unicode.org/reports/tr15/>
 âUnicode Normalization Formsâ), already implemented by GNU libunistring.
 
@@ -91,135 +97,133 @@ POSIX and --ignore-case
 -----------------------
 
 For this issue, interesting things to check in POSIX include the
-Volume âBase Definitions (XBD)â, Chapter âRegular Expressionsâ and in
+Open Group Base Specifications, Chapter âRegular Expressionsâ, in
 particular Section âRegular Expression General Requirementsâ and its
-paragraph about caseless matching (note that this may not have been
-fully thought through and that this text may be self-contradicting
+paragraph about caseless matching (this may not have been fully
+thought through and that this text may be self-contradicting
 [specifically: âof either data or patternsâ versus all the rest]).
+See:
 
-In particular, consider the following with POSIX's approach to case
-folding in mind. Assume a non-Turkic locale with a character
+http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_02
+
+In particular, consider the following with POSIXâs approach to case
+folding in mind.  Assume a non-Turkic locale with a character
 repertoire reduced to the following various forms of âLATIN LETTER Iâ:
 
-0049;LATIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;0069;
-0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049
-0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;\
-  LATIN CAPITAL LETTER I DOT;;;0069;
-0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;0049
+  0049;LATIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;0069;
+  0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049
+  0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;\
+    LATIN CAPITAL LETTER I DOT;;;0069;
+  0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;0049
 
-First note the differing UTF-8 octet lengths of U+0049 (0x49) and
-U+0069 (0x69) versus U+0130 (0xC4 0xB0) and U+0131 (0xC4 0xB1). This
-implies that whole UTF-8 strings cannot be case-converted in place,
-using the same memory buffer, and that the needed octet-size of the
-new buffer cannot merely be guessed (although there's a simple upper
-bound of six times the size of the input, as the longest UTF-8
-encoding of any character is six bytes).
+UTF-8 octet lengths differ between U+0049 (0x49) and U+0069 (0x69)
+versus U+0130 (0xC4 0xB0) and U+0131 (0xC4 0xB1).  This implies that
+whole UTF-8 strings cannot be case-converted in place, using the same
+memory buffer, and that the needed octet-size of the new buffer cannot
+merely be guessed (although thereâs a simple upper bound of five times
+the size of the input, as the longest UTF-8 encoding of any character
+is five bytes).
 
 We have
 
-lc(I) = i, uc(I) = I
-lc(i) = i, uc(i) = I
-lc(Ä°) = i, uc(Ä°) = Ä°
-lc(Ä±) = Ä±, uc(Ä±) = I
+  lc(I) = i, uc(I) = I
+  lc(i) = i, uc(i) = I
+  lc(Ä°) = i, uc(Ä°) = Ä°
+  lc(Ä±) = Ä±, uc(Ä±) = I
 
 where lc() and uc() denote lower-case and upper-case conversions.
 
-There are several candidate --ignore-case logics (including the one
-mandated by POSIX):
-
-Using the
+There are several candidate --ignore-case logics.  Using the
 
-if (lc(input_wchar) == lc(pattern_wchar))
+  if (lc(input_wchar) == lc(pattern_wchar))
 
 logic leads to the following matches:
 
-  \in  I  i  Ä°  Ä±
-pat\   ----------
-"I" |  Y  Y  Y  n
-"i" |  Y  Y  Y  n
-"Ä°" |  Y  Y  Y  n
-"Ä±" |  n  n  n  Y
+    \in  I  i  Ä°  Ä±
+  pat\   ----------
+   I  |  Y  Y  Y  n
+   i  |  Y  Y  Y  n
+   Ä°  |  Y  Y  Y  n
+   Ä±  |  n  n  n  Y
 
 There is a lack of symmetry between CAPITAL and SMALL LETTERs with
-this.
+this.  Using the
 
-Using the
+  if (uc(input_wchar) == uc(pattern_wchar))
 
-if (uc(input_wchar) == uc(pattern_wchar))
+logic (which is what GNU grep currently does although this is not
+documented or guaranteed in the future), leads to the following
+matches:
 
-logic leads to the following matches:
-
-  \in  I  i  Ä°  Ä±
-pat\   ----------
-"I" |  Y  Y  n  Y
-"i" |  Y  Y  n  Y
-"Ä°" |  n  n  Y  n
-"Ä±" |  Y  Y  n  Y
+    \in  I  i  Ä°  Ä±
+  pat\   ----------
+   I  |  Y  Y  n  Y
+   i  |  Y  Y  n  Y
+   Ä°  |  n  n  Y  n
+   Ä±  |  Y  Y  n  Y
 
 There is a lack of symmetry between CAPITAL and SMALL LETTERs with
 this.
 
 Using the
 
-if (   lc(input_wchar) == lc(pattern_wchar)
-|| uc(input_wchar) == uc(pattern_wchar))
+  if (lc(input_wchar) == lc(pattern_wchar)
+      || uc(input_wchar) == uc(pattern_wchar))
 
 logic leads to the following matches:
 
-  \in  I  i  Ä°  Ä±
-pat\   ----------
-"I" |  Y  Y  Y  Y
-"i" |  Y  Y  Y  Y
-"Ä°" |  Y  Y  Y  n
-"Ä±" |  Y  Y  n  Y
+    \in  I  i  Ä°  Ä±
+  pat\   ----------
+   I  |  Y  Y  Y  Y
+   i  |  Y  Y  Y  Y
+   Ä°  |  Y  Y  Y  n
+   Ä±  |  Y  Y  n  Y
 
-There is some elegance and symmetry with this. But there are
-potentially two conversions to be made per input character. If the
+There is some elegance and symmetry with this.  But there are
+potentially two conversions to be made per input character.  If the
 pattern is pre-converted, two copies of it need to be kept and used in
 a mutually coherent fashion.
 
 Using the
 
-if (      input_wchar  == pattern_wchar
-|| lc(input_wchar) == pattern_wchar
-|| uc(input_wchar) == pattern_wchar)
+  if (input_wchar  == pattern_wchar
+      || lc(input_wchar) == pattern_wchar
+      || uc(input_wchar) == pattern_wchar)
 
-logic (as mandated by POSIX) leads to the following matches:
+logic (a plausible interpretation of POSIX) leads to the following
+matches:
 
-  \in  I  i  Ä°  Ä±
-pat\   ----------
-"I" |  Y  Y  n  Y
-"i" |  Y  Y  Y  n
-"Ä°" |  n  n  Y  n
-"Ä±" |  n  n  n  Y
+    \in  I  i  Ä°  Ä±
+  pat\   ----------
+   I  |  Y  Y  n  Y
+   i  |  Y  Y  Y  n
+   Ä°  |  n  n  Y  n
+   Ä±  |  n  n  n  Y
 
-There is a different CAPITAL/SMALL symmetry with this. But there's
-also a loss of pattern/input symmetry that's unique to it. Also there
+There is a different CAPITAL/SMALL symmetry with this.  But thereâs
+also a loss of pattern/input symmetry thatâs unique to it.  Also there
 are potentially two conversions to be made per input character.
 
 Using the
 
-if (lc(uc(input_wchar)) == lc(uc(pattern_wchar)))
-
+  if (lc(uc(input_wchar)) == lc(uc(pattern_wchar)))
 
 logic leads to the following matches:
 
-  \in  I  i  Ä°  Ä±
-pat\   ----------
-"I" |  Y  Y  Y  Y
-"i" |  Y  Y  Y  Y
-"Ä°" |  Y  Y  Y  Y
-"Ä±" |  Y  Y  Y  Y
+    \in  I  i  Ä°  Ä±
+  pat\   ----------
+   I  |  Y  Y  Y  Y
+   i  |  Y  Y  Y  Y
+   Ä°  |  Y  Y  Y  Y
+   Ä±  |  Y  Y  Y  Y
 
-This shows total symmetry and transitivity
-(at least in this example analysis).
-There are two conversions to be made per input character,
-but support could be added for having
-a single straight mapping performing
-a composition of the two conversions.
+This shows total symmetry and transitivity (at least in this example
+analysis).  There are two conversions to be made per input character,
+but support could be added for having a single straight mapping
+performing a composition of the two conversions.
 
-Any optimization in the implementation of each logic
-must not change its basic semantic.
+Any optimization in the implementation of each logic must not change
+its basic semantic.
 
 
 Unicode and --ignore-case
@@ -227,112 +231,109 @@ Unicode and --ignore-case
 
 For this issue, interesting things to check in Unicode include:
 
-  -- The Unicode Standard, Chapter 3
-     (<http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf>
-     âConformanceâ), Section 3.13 (âDefault Case Operationsâ) and the
-     toCasefold() case conversion operation.
+  - The Unicode Standard, Chapter 3
+    (<http://www.unicode.org/versions/Unicode9.0.0/ch03.pdf>
+    âConformanceâ), Section 3.13 (âDefault Case Algorithmsâ) and the
+    toCasefold() case conversion operation.
 
-  -- The Unicode Standard, Chapter 4
-     (<http://www.unicode.org/versions/Unicode5.2.0/ch04.pdf>
-     âCharacter Propertiesâ), Section 4.2 (âCaseâNormativeâ) and
-     the <http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt>
-     SpecialCasing.txt and
-     <http://www.unicode.org/Public/UNIDATA/CaseFolding.txt>
-     CaseFolding.txt files from the
-     <http://www.unicode.org/Public/UNIDATA/UCD.html> Unicode
-     Character Database .
+  - The Unicode Standard, Chapter 4
+    (<http://www.unicode.org/versions/Unicode9.0.0/ch04.pdf>
+    âCharacter Propertiesâ), Section 4.2 (âCaseâ) and
+    the <http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt>
+    SpecialCasing.txt and
+    <http://www.unicode.org/Public/UNIDATA/CaseFolding.txt>
+    CaseFolding.txt files.
 
-The <http://www.unicode.org/standard/standard.html> Unicode Standard,
-Chapter 5 (â<http://www.unicode.org/versions/Unicode5.2.0/ch05.pdf>
-Implementation Guidelines â), Section 5.18 (âCase Mappingsâ),
-Subsection âCaseless Matchingâ.
+  - The Unicode Standard, Chapter 5
+    (<http://www.unicode.org/versions/Unicode9.0.0/ch05.pdf>
+    âImplementation Guidelinesâ), Section 5.18 (âCase Mappingsâ),
+    Subsection âCaseless Matchingâ.
 
-The Unicode <http://www.unicode.org/charts/case/> case charts.
+  - The Unicode case charts <http://www.unicode.org/charts/case/>.
 
 Unicode uses the
 
-if (toCasefold(input_wchar_string) == toCasefold(pattern_wchar_string))
+  if (toCasefold(input_wchar_string) == toCasefold(pattern_wchar_string))
 
-logic for caseless matching. Let's consider the âLATIN LETTER Iâ
-example mentioned above. In a non-Turkic locale, simple case folding
-yields
+logic for caseless matching.  Consider the âLATIN LETTER Iâ example
+mentioned above.  In a non-Turkic locale, simple case folding yields
 
-toCasefold_simple(U+0049) = U+0069
-toCasefold_simple(U+0069) = U+0069
-toCasefold_simple(U+0130) = U+0130
-toCasefold_simple(U+0131) = U+0131
+  toCasefold_simple(U+0049) = U+0069
+  toCasefold_simple(U+0069) = U+0069
+  toCasefold_simple(U+0130) = U+0130
+  toCasefold_simple(U+0131) = U+0131
 
 which leads to the following matches:
 
-  \in  I  i  Ä°  Ä±
-pat\   ----------
-"I" |  Y  Y  n  n
-"i" |  Y  Y  n  n
-"Ä°" |  n  n  Y  n
-"Ä±" |  n  n  n  Y
+    \in  I  i  Ä°  Ä±
+  pat\   ----------
+   I  |  Y  Y  n  n
+   i  |  Y  Y  n  n
+   Ä°  |  n  n  Y  n
+   Ä±  |  n  n  n  Y
 
 This is different from anything so far!
 
 In a non-Turkic locale, full case folding yields
 
-toCasefold_full(U+0049) = U+0069
-toCasefold_full(U+0069) = U+0069
-toCasefold_full(U+0130) = <U+0069, U+0307>
-toCasefold_full(U+0131) = U+0131
+  toCasefold_full(U+0049) = U+0069
+  toCasefold_full(U+0069) = U+0069
+  toCasefold_full(U+0130) = <U+0069, U+0307>
+  toCasefold_full(U+0131) = U+0131
 
 with
 
-0307;COMBINING DOT ABOVE;Mn;230;NSM;;;;;N;NON-SPACING DOT ABOVE;;;;
+  0307;COMBINING DOT ABOVE;Mn;230;NSM;;;;;N;NON-SPACING DOT ABOVE;;;;
 
 which leads to the following matches:
 
-  \in  I  i  Ä°  Ä±
-pat\   ----------
-"I" |  Y  Y  *  n
-"i" |  Y  Y  *  n
-"Ä°" |  n  n  Y  n
-"Ä±" |  n  n  n  Y
+    \in  I  i  Ä°  Ä±
+  pat\   ----------
+   I  |  Y  Y  *  n
+   i  |  Y  Y  *  n
+   Ä°  |  n  n  Y  n
+   Ä±  |  n  n  n  Y
 
 This is just sad!
 
-Note that having toCasefold(U+0131), simple or full, map to itself
-instead of U+0069 is in contradiction with the rules of Section 5.18
-of the Unicode Standard since toUpperCase(U+0131) is U+0049. Same
-thing for toCasefold_simple(U+0130) since toLowerCase(U+0131) is
-U+0069. The justification for the weird toCasefold_full(U+0130)
-mapping is unknown; it doesn't even make sense to add a dot (U+0307)
-to a letter that already has one (U+0069). It would have been so
-simple to put them all in the same equivalence class!
+Having toCasefold(U+0131), simple or full, map to itself instead of
+U+0069 is in contradiction with the rules of Section 5.18 of the
+Unicode Standard since toUpperCase(U+0131) is U+0049.  Same thing for
+toCasefold_simple(U+0130) since toLowerCase(U+0131) is U+0069.  The
+justification for the weird toCasefold_full(U+0130) mapping is
+unknown; it doesnât even make sense to add a dot (U+0307) to a letter
+that already has one (U+0069).  It would have been so simple to put
+them all in the same equivalence class!
 
-Otherwise, also consider the following problem with Unicode's approach
-on case folding in mind. Assume that we want to perform
+Otherwise, also consider the following problem with Unicodeâs approach
+on case folding in mind.  Assume that we want to perform
 
-echo 'AÃBC | grep -i 'Sb'
+  echo 'AÃBC' | grep -i 'Sb'
 
 which corresponds to
 
-input:    U+0041 U+00DF U+0042 U+0043 U+000A
-pattern:  U+0053 U+0062
+  input:    U+0041 U+00DF U+0042 U+0043 U+000A
+  pattern:  U+0053 U+0062
 
-Following âCaseFolding-4.1.0.txtâ, applying the toCasefold()
-transformation to these yields
+Following CaseFolding.txt, applying the toCasefold() transformation to
+these yields
 
-input:    U+0061 U+0073 U+0073 U+0062 U+0063 U+000A
-pattern:                U+0073 U+0062
+  input:    U+0061 U+0073 U+0073 U+0062 U+0063 U+000A
+  pattern:                U+0073 U+0062
 
-so, according to this approach, the input should match the pattern. As
-long as the original input line is to be reported to the user as a
-whole, there is no problem (from the user's point-of-view;
+so, according to this approach, the input should match the pattern.
+As long as the original input line is to be reported to the user as a
+whole, there is no problem (from the userâs point-of-view;
 implementation is complicated by this).
 
 However, consider both these GNU extensions:
 
-echo 'AÃBC' | grep -i --only-matching 'Sb'
-echo 'AÃBC' | grep -i --color=always  'Sb'
+  echo 'AÃBC' | grep -i --only-matching 'Sb'
+  echo 'AÃBC' | grep -i --color=always  'Sb'
 
 What is to be reported in these cases, since the match begins in the
-*middle* of the original input character 'Ã'?
+*middle* of the original input character âÃâ?
 
-Note that Unicode's toCasefold() cannot be implemented in terms of
-POSIX' towctrans() since that can only return a single wint_t value
-per input wint_t value.
+Unicodeâs toCasefold() cannot be implemented in terms of POSIXâs
+towctrans() since that can only return a single wint_t value per input
+wint_t value.
diff --git a/doc/grep.in.1 b/doc/grep.in.1
index ed6382b..367501f 100644
--- a/doc/grep.in.1
+++ b/doc/grep.in.1
@@ -34,11 +34,16 @@ grep, egrep, fgrep \- print lines matching a pattern
 .br
 .B grep
 .RI [ OPTIONS ]
-.RB [ \-e
+.B \-e
 .I PATTERN
-|
+\&.\|.\|.\&
+.RI [ FILE .\|.\|.]
+.br
+.B grep
+.RI [ OPTIONS ]
 .B \-f
-.IR FILE ]
+.I FILE
+\&.\|.\|.\&
 .RI [ FILE .\|.\|.]
 .
 .SH DESCRIPTION
@@ -129,9 +134,8 @@ option, search for all patterns given.
 The empty file contains zero patterns, and therefore matches nothing.
 .TP
 .BR \-i ", " \-\^\-ignore\-case
-Ignore case distinctions in both the
-.I PATTERN
-and the input files.
+Ignore case distinctions, so that characters that differ only in case
+match each other.
 .TP
 .BR \-v ", " \-\^\-invert\-match
 Invert the sense of matching, to select non-matching lines.
@@ -305,10 +309,10 @@ on a Unix machine.
 This option has no effect unless
 .B \-b
 option is also used;
-it has no effect on platforms other than \s-1MS-DOS\s0 and \s-1MS\s0-Windows.
+it has no effect on platforms other than MS-DOS and MS-Windows.
 .TP
 .BR \-Z ", " \-\^\-null
-Output a zero byte (the \s-1ASCII\s0
+Output a zero byte (the ASCII
 .B NUL
 character) instead of the character that normally follows a file name.
 For example,
@@ -565,7 +569,7 @@ This can cause a performance penalty.
 .TP
 .BR \-U ", " \-\^\-binary
 Treat the file(s) as binary.
-By default, under \s-1MS-DOS\s0 and \s-1MS\s0-Windows,
+By default, under MS-DOS and MS-Windows,
 .BR grep
 guesses whether a file is text or binary as described for the
 .B \-\^\-binary\-files
@@ -585,7 +589,7 @@ matching mechanism verbatim; if the file is a text file 
with CR/LF
 pairs at the end of each line, this will cause some regular
 expressions to fail.
 This option has no effect on platforms
-other than \s-1MS-DOS\s0 and \s-1MS\s0-Windows.
+other than MS-DOS and MS-Windows.
 .TP
 .BR \-z ", " \-\^\-null\-data
 Treat input and output data as sequences of lines, each terminated by
@@ -606,8 +610,8 @@ expressions, by using various operators to combine smaller 
expressions.
 .B grep
 understands three different versions of regular expression syntax:
 \*(lqbasic\*(rq (BRE), \*(lqextended\*(rq (ERE) and \*(lqperl\*(rq (PCRE).
-In
-.RB "\s-1GNU\s0\ " grep ,
+In GNU
+.B grep
 there is no difference in available functionality between basic and
 extended syntaxes.
 In other implementations, basic regular expressions are less powerful.
@@ -685,7 +689,8 @@ and
 For example,
 .B [[:alnum:]]
 means the character class of numbers and
-letters in the current locale. In the C locale and \s-1ASCII\s0
+letters in the current locale.
+In the C locale and ASCII
 character set encoding, this is the same as
 .BR [0\-9A\-Za\-z] .
 (Note that the brackets in these class names are part of the symbolic
@@ -757,7 +762,7 @@ or more times.
 The preceding item is matched at most
 .I m
 times.
-This is a \s-1GNU\s0 extension.
+This is a GNU extension.
 .TP
 .BI { n , m }
 The preceding item is matched at least
@@ -834,7 +839,7 @@ category.
 The C locale is used if none of these environment variables are set,
 if the locale catalog is not installed, or if
 .B grep
-was not compiled with national language support (\s-1NLS\s0).
+was not compiled with national language support (NLS).
 The shell command
 .B "locale \-a"
 lists locales that are currently available.
@@ -1102,13 +1107,13 @@ The default C locale uses American English messages.
 .B POSIXLY_CORRECT
 If set,
 .B grep
-behaves as \s-1POSIX\s0 requires; otherwise,
+behaves as POSIX requires; otherwise,
 .B grep
-behaves more like other \s-1GNU\s0 programs.
-\s-1POSIX\s0 requires that options that follow file names must be
+behaves more like other GNU programs.
+POSIX requires that options that follow file names must be
 treated as file names; by default, such options are permuted to the
 front of the operand list and are treated as options.
-Also, \s-1POSIX\s0 requires that unrecognized options be diagnosed as
+Also, POSIX requires that unrecognized options be diagnosed as
 \*(lqillegal\*(rq, but since they are not really against the law the default
 is to diagnose them as \*(lqinvalid\*(rq.
 .B POSIXLY_CORRECT
@@ -1132,7 +1137,7 @@ to be an option, even if it appears to be one.
 A shell can put this variable in the environment for each command it runs,
 specifying which operands are the results of file name wildcard
 expansion and therefore should not be treated as options.
-This behavior is available only with the \s-1GNU\s0 C library, and only
+This behavior is available only with the GNU C library, and only
 when
 .B POSIXLY_CORRECT
 is not set.
@@ -1149,7 +1154,7 @@ is used and a line is selected, the exit status is 0 even 
if an error
 occurred.
 .
 .SH COPYRIGHT
-Copyright 1998-2000, 2002, 2005-2017 Free Software Foundation, Inc.
+Copyright 1998\(en2000, 2002, 2005\(en2017 Free Software Foundation, Inc.
 .PP
 This is free software;
 see the source for copying conditions.
@@ -1187,7 +1192,7 @@ read(2),
 pcre(3), pcresyntax(3), pcrepattern(3),
 terminfo(5),
 glob(7), regex(7).
-.SS "\s-1POSIX\s0 Programmer's Manual Page"
+.SS "POSIX Programmer's Manual Page"
 grep(1p).
 .SS "Full Documentation"
 A
diff --git a/doc/grep.texi b/doc/grep.texi
index 2d3ee78..7c051e0 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1152,7 +1152,7 @@ Regular expressions are constructed analogously to 
arithmetic expressions,
 by using various operators to combine smaller expressions.
 @command{grep} understands
 three different versions of regular expression syntax:
-``basic'' (BRE), ``extended'' (ERE) and ``perl'' (PCRE).
+basic (BRE), extended (ERE), and Perl-compatible (PCRE).
 In GNU @command{grep},
 there is no difference in available functionality between the basic and
 extended syntaxes.
@@ -1831,7 +1831,7 @@ Back-references are very slow, and may require 
exponential time.
 GNU @command{grep} is licensed under the GNU GPL, which makes it @dfn{free
 software}.
 
-The ``free'' in ``free software'' refers to liberty, not price. As
+The ``free'' in ``free software'' refers to liberty, not price.  As
 some GNU project advocates like to point out, think of ``free speech''
 rather than ``free beer''.  In short, you have the right (freedom) to
 run and change @command{grep} and distribute it to other people, and---if you

-----------------------------------------------------------------------

Summary of changes:
 TODO          | 361 +++++++++++++++++++++++++++++-----------------------------
 doc/grep.in.1 |  49 ++++----
 doc/grep.texi |   4 +-
 3 files changed, 210 insertions(+), 204 deletions(-)


hooks/post-receive
-- 
grep
[Prev in Thread]
Current Thread
[Next in Thread]
grep branch, master, updated. v3.0-3-ge1ca01b, Paul Eggert <=
Prev by Date: grep branch, master, updated. v3.0-2-g96e100a
Next by Date: grep branch, master, updated. v3.0-7-gc4485ac
Previous by thread: grep branch, master, updated. v3.0-2-g96e100a
Next by thread: grep branch, master, updated. v3.0-7-gc4485ac
Index(es):
- Date
- Thread