grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Changes to html_node/Performance.html


From: Jim Meyering
Subject: Changes to html_node/Performance.html
Date: Sun, 30 Dec 2018 01:24:25 -0500 (EST)

CVSROOT:        /webcvs/grep
Module name:    grep
Changes by:     Jim Meyering <meyering> 18/12/30 01:24:22

Index: html_node/Performance.html
===================================================================
RCS file: html_node/Performance.html
diff -N html_node/Performance.html
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ html_node/Performance.html  30 Dec 2018 06:24:22 -0000      1.1
@@ -0,0 +1,158 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd";>
+<html>
+<!-- This manual is for grep, a pattern matching engine.
+
+Copyright (C) 1999-2002, 2005, 2008-2018 Free Software Foundation,
+Inc.
+
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
+Texts.  A copy of the license is included in the section entitled
+"GNU Free Documentation License". -->
+<!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ -->
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<title>Performance (GNU Grep 3.3)</title>
+
+<meta name="description" content="Performance (GNU Grep 3.3)">
+<meta name="keywords" content="Performance (GNU Grep 3.3)">
+<meta name="resource-type" content="document">
+<meta name="distribution" content="global">
+<meta name="Generator" content="makeinfo">
+<link href="index.html#Top" rel="start" title="Top">
+<link href="Index.html#Index" rel="index" title="Index">
+<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
+<link href="index.html#Top" rel="up" title="Top">
+<link href="Reporting-Bugs.html#Reporting-Bugs" rel="next" title="Reporting 
Bugs">
+<link href="Usage.html#Usage" rel="prev" title="Usage">
+<style type="text/css">
+<!--
+a.summary-letter {text-decoration: none}
+blockquote.indentedblock {margin-right: 0em}
+blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
+blockquote.smallquotation {font-size: smaller}
+div.display {margin-left: 3.2em}
+div.example {margin-left: 3.2em}
+div.lisp {margin-left: 3.2em}
+div.smalldisplay {margin-left: 3.2em}
+div.smallexample {margin-left: 3.2em}
+div.smalllisp {margin-left: 3.2em}
+kbd {font-style: oblique}
+pre.display {font-family: inherit}
+pre.format {font-family: inherit}
+pre.menu-comment {font-family: serif}
+pre.menu-preformatted {font-family: serif}
+pre.smalldisplay {font-family: inherit; font-size: smaller}
+pre.smallexample {font-size: smaller}
+pre.smallformat {font-family: inherit; font-size: smaller}
+pre.smalllisp {font-size: smaller}
+span.nolinebreak {white-space: nowrap}
+span.roman {font-family: initial; font-weight: normal}
+span.sansserif {font-family: sans-serif; font-weight: normal}
+ul.no-bullet {list-style: none}
+-->
+</style>
+<link rel="stylesheet" type="text/css" href="/software/gnulib/manual.css">
+
+
+</head>
+
+<body lang="en">
+<a name="Performance"></a>
+<div class="header">
+<p>
+Next: <a href="Reporting-Bugs.html#Reporting-Bugs" accesskey="n" 
rel="next">Reporting Bugs</a>, Previous: <a href="Usage.html#Usage" 
accesskey="p" rel="prev">Usage</a>, Up: <a href="index.html#Top" accesskey="u" 
rel="up">Top</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of 
contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" 
rel="index">Index</a>]</p>
+</div>
+<hr>
+<a name="Performance-1"></a>
+<h2 class="chapter">5 Performance</h2>
+
+<a name="index-performance"></a>
+<p>Typically <code>grep</code> is an efficient way to search text.  However,
+it can be quite slow in some cases, and it can search large files
+where even minor performance tweaking can help significantly.
+Although the algorithm used by <code>grep</code> is an implementation
+detail that can change from release to release, understanding its
+basic strengths and weaknesses can help you improve its performance.
+</p>
+<p>The <code>grep</code> command operates partly via a set of automata that
+are designed for efficiency, and partly via a slower matcher that
+takes over when the fast matchers run into unusual features like
+back-references.  When feasible, the Boyer&ndash;Moore fast string
+searching algorithm is used to match a single fixed pattern, and the
+Aho&ndash;Corasick algorithm is used to match multiple fixed patterns.
+</p>
+<a name="index-locales"></a>
+<p>Generally speaking <code>grep</code> operates more efficiently in
+single-byte locales, since it can avoid the special processing needed
+for multi-byte characters.  If your patterns will work just as well
+that way, setting <code>LC_ALL</code> to a single-byte locale can help
+performance considerably.  Setting &lsquo;<samp>LC_ALL='C'</samp>&rsquo; can be
+particularly efficient, as <code>grep</code> is tuned for that locale.
+</p>
+<a name="index-case-insensitive-search-1"></a>
+<p>Outside the &lsquo;<samp>C</samp>&rsquo; locale, case-insensitive search, 
and search for
+bracket expressions like &lsquo;<samp>[a-z]</samp>&rsquo; and 
&lsquo;<samp>[[=a=]b]</samp>&rsquo;, can be
+surprisingly inefficient due to difficulties in fast portable access to
+concepts like multi-character collating elements.
+</p>
+<a name="index-back_002dreferences"></a>
+<p>A back-reference such as &lsquo;<samp>\1</samp>&rsquo; can hurt performance 
significantly
+in some cases, since back-references cannot in general be implemented
+via a finite state automaton, and instead trigger a backtracking
+algorithm that can be quite inefficient.  For example, although the
+pattern &lsquo;<samp>^(.*)\1{14}(.*)\2{13}$</samp>&rsquo; matches only lines 
whose
+lengths can be written as a sum <em>15x + 14y</em> for nonnegative
+integers <em>x</em> and <em>y</em>, the pattern matcher does not perform
+linear Diophantine analysis and instead backtracks through all
+possible matching strings, using an algorithm that is exponential in
+the worst case.
+</p>
+<a name="index-holes-in-files"></a>
+<p>On some operating systems that support files with holes&mdash;large
+regions of zeros that are not physically present on secondary
+storage&mdash;<code>grep</code> can skip over the holes efficiently without
+needing to read the zeros.  This optimization is not available if the
+<samp>-a</samp> (<samp>--binary-files=text</samp>) option is used (see <a 
href="File-and-Directory-Selection.html#File-and-Directory-Selection">File and 
Directory Selection</a>), unless the <samp>-z</samp> (<samp>--null-data</samp>)
+option is also used (see <a href="Other-Options.html#Other-Options">Other 
Options</a>).
+</p>
+<p>For more about the algorithms used by <code>grep</code> and about
+related string matching algorithms, see:
+</p>
+<ul>
+<li> Aho AV. Algorithms for finding patterns in strings.
+In: van Leeuwen J. <em>Handbook of Theoretical Computer Science</em>, vol. A.
+New York: Elsevier; 1990. p. 255&ndash;300.
+This surveys classic string matching algorithms, some of which are
+used by <code>grep</code>.
+
+</li><li> Aho AV, Corasick MJ. Efficient string matching: an aid to 
bibliographic search.
+<em>CACM</em>. 1975;18(6):333&ndash;40.
+<a 
href="https://dx.doi.org/10.1145/360825.360855";>https://dx.doi.org/10.1145/360825.360855</a>.
+This introduces the Aho&ndash;Corasick algorithm.
+
+</li><li> Boyer RS, Moore JS. A fast string searching algorithm.
+<em>CACM</em>. 1977;20(10):762&ndash;72.
+<a 
href="https://dx.doi.org/10.1145/359842.359859";>https://dx.doi.org/10.1145/359842.359859</a>.
+This introduces the Boyer&ndash;Moore algorithm.
+
+</li><li> Faro S, Lecroq T. The exact online string matching problem: a review
+of the most recent results.
+<em>ACM Comput Surv</em>. 2013;45(2):13.
+<a 
href="https://dx.doi.org/10.1145/2431211.2431212";>https://dx.doi.org/10.1145/2431211.2431212</a>.
+This surveys string matching algorithms that might help improve the
+performance of <code>grep</code> in the future.
+</li></ul>
+
+<hr>
+<div class="header">
+<p>
+Next: <a href="Reporting-Bugs.html#Reporting-Bugs" accesskey="n" 
rel="next">Reporting Bugs</a>, Previous: <a href="Usage.html#Usage" 
accesskey="p" rel="prev">Usage</a>, Up: <a href="index.html#Top" accesskey="u" 
rel="up">Top</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of 
contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" 
rel="index">Index</a>]</p>
+</div>
+
+
+
+</body>
+</html>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]