>From 5acb1dc0dffbf8a8e9db87bc6caf9fa7c3dc170e Mon Sep 17 00:00:00 2001
From: Paolo Bonzini
Date: Mon, 8 Mar 2010 17:14:51 +0100
Subject: [PATCH] more work on TODO
* TODO: More work on the first section. Use clearer section headers.
---
TODO | 99 +++++++++++++++++++++++++++++++----------------------------------
1 files changed, 47 insertions(+), 52 deletions(-)
diff --git a/TODO b/TODO
index 62e302e..2cfd0ce 100644
--- a/TODO
+++ b/TODO
@@ -4,58 +4,52 @@
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.
-Get sane performance with UTF-8 locales.
+===============
+Short term work
+===============
-Improve the test infrastructure.
+See where we are with UTF-8 performance.
-Other small patches which wait for a test case.
+Merge Debian patches 55-bigfile.patch, 69-mbtowc.patch and
+70-man_apostrophe.patch. Go through patches in Savannah.
-Some _minimal_ cleanup of the grep(), grepdir(), recursion (the "main
-loop") and fix --directories=read
+Cleanup of the grep(), grepdir(), recursion (the "main loop") to use fts.
+Fix --directories=read.
Write better Texinfo documentation for grep. The manual page would be a
good place to start, but Info documents are also supposed to contain a
tutorial and examples.
-Fix the DFA matcher to never use exponential space. (Fortunately, these
-cases are rare.)
-
-Improve the performance of the regex backtracking matcher. This matcher
-is agonizingly slow, and is responsible for grep sometimes being slower
-than Unix grep when backreferences are used.
+Some test in tests/spencer2.tests should have failed! Need to filter out
+some bugs in dfa.[ch]/regex.[ch].
-Some test in tests/spencer2.tests should have failed!
-Need to filter out some bugs in dfa.[ch]/regex.[ch].
+Multithreading?
-Threads for grep?
-
-GNU grep does 32-bit arithmetic, it needs to move to 64-bit.
+GNU grep does 32-bit arithmetic, it needs to move to 64-bit (i.e.
+size_t/ptrdiff_t).
Clean up, too many #ifdefs!
-Check some new algorithms for matching; talk to Karl Berry and Nelson.
-Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142)
-claim that his algorithm is faster than Boyer-More. Worth checking.
-
-Lazy dynamic linking of libpcre, libz, and libbz2?
+Lazy dynamic linking of libpcre.
Check FreeBSD's integration of zgrep (-Z) and bzgrep (-J) in one
binary. Is there a possibility of doing even better by automatically
checking the magic of binary files ourselves (0x1F 0x8B for gzip, 0x1F
-0x9D for compress, and 0x42 0x5A 0x68 for bzip2)?
+0x9D for compress, and 0x42 0x5A 0x68 for bzip2)? Once what to do with
+libpcre is decided, do the same for libz and libbz2.
-##
+
+==================
+Matching algorithms
+==================
-Check .
-Take a look at these and consider opportunities
-for merging or cloning:
+Check . Take a look at these
+and consider opportunities for merging or cloning:
-- ja-grep's mlb2 patch (Japanese grep)
-- lgrep (from lv, a Powerful Multilingual File Viewer / Grep)
;
- -- pcregrep (from Perl-Compatible Regular Expressions library)
- ;
-- cgrep (Context grep)
seems like nice work;
-- sgrep (Struct grep) ;
@@ -65,25 +59,38 @@ for merging or cloning:
;
-- ggrep (Grouse grep) ;
-- grep.py (Python grep) ;
- -- freegrep (a BSD-licensed grep for those who can't stand the GNU GPL)
- ;
+ -- freegrep ;
-##
+Check some new algorithms for matching; talk to Karl Berry and Nelson.
+Sunday's "Quick Search" Algorithm (CACM 33, 1990-08-08 pp. 132-142)
+claim that his algorithm is faster than Boyer-More. Worth checking.
-POSIX Compliance: see p10003.x
+Fix the DFA matcher to never use exponential space. (Fortunately, these
+cases are rare.)
-In general, interesting things to check in POSIX/OpenGroup include:
+
+============================
+Standards: POSIX and Unicode
+============================
-Provide support for the POSIX [= =] and [. .] constructs. This is
-difficult because it requires locale-dependent details of the
-character set and collating sequence, but POSIX does not standardize
-any method for accessing this information!
+For POSIX compliance, see p10003.x. Current support for the POSIX [= =]
+and [. .] constructs is limited. This is difficult because it requires
+locale-dependent details of the character set and collating sequence,
+but POSIX does not standardize any method for accessing this information!
-Moving away from GNU regex API for POSIX regex API.
+For Unicode, interesting things to check include the Unicode Standard
+ and the Unicode Technical
+Standard #18 ( âUnicode Regular
+Expressionsâ). Talk to Bruno Haible who's mantaining GNU libunistring.
+See also Unicode Standard Annex #15 (
+âUnicode Normalization Formsâ), already implemented by GNU libunistring.
-##
+In particular, --ignore-case needs to be evaluated against the standards.
+We may want to deviate from POSIX if Unicode provides better or clearer
+semantics.
POSIX and --ignore-case
+-----------------------
For this issue, interesting things to check in POSIX include the
Volume âBase Definitions (XBD)â, Chapter âRegular Expressionsâ and in
@@ -215,21 +222,9 @@ a composition of the two conversions.
Any optimization in the implementation of each logic
must not change its basic semantic.
-##
-
-In general, interesting things to check in Unicode include:
-
-The Unicode Standard.
-
-Unicode Technical Standard #18 (
-âUnicode Regular Expressionsâ).
-
-Unicode Standard Annex #15 (
-âUnicode Normalization Formsâ).
-
-##
Unicode and --ignore-case
+-------------------------
For this issue, interesting things to check in Unicode include:
--
1.6.6