[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] dfa: optimize UTF-8 period
From: |
Jim Meyering |
Subject: |
Re: [PATCH v2] dfa: optimize UTF-8 period |
Date: |
Tue, 20 Apr 2010 15:49:35 +0200 |
Paolo Bonzini wrote:
> On 04/20/2010 12:06 PM, Jim Meyering wrote:
>> printf '\n'|LC_ALL=en_US.utf8 src/grep -zl .
>> printf '\0'|LC_ALL=en_US.utf8 src/grep -l .
>>
>> They should fail.
>
> By the way, I disagree that the first should fail. With -z the record
> separator ("newline") character is \0, so \n is just like any other
> character. The second should fail with
>
> printf '\0'|LC_ALL=en_US.utf8 POSIXLY_CORRECT=1 src/grep -l .
Good points. Thanks.
Here's a revised test (still failing, of course):
>From 5490e0283796cd4604a86a781644ef87de95526f Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Tue, 20 Apr 2010 11:34:57 +0200
Subject: [PATCH] tests: ensure that "." does not match NUL
* tests/dot-vs-NUL-and-NL: New file.
* tests/Makefile.am (TESTS): Add it.
---
tests/Makefile.am | 1 +
tests/dot-vs-NUL-and-NL | 31 +++++++++++++++++++++++++++++++
2 files changed, 32 insertions(+), 0 deletions(-)
create mode 100644 tests/dot-vs-NUL-and-NL
diff --git a/tests/Makefile.am b/tests/Makefile.am
index c2cc82c..b81e9ee 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -37,6 +37,7 @@ TESTS = \
case-fold-char-type \
char-class-multibyte \
dfaexec-multibyte \
+ dot-vs-NUL-and-NL \
empty \
ere.sh \
euc-mb \
diff --git a/tests/dot-vs-NUL-and-NL b/tests/dot-vs-NUL-and-NL
new file mode 100644
index 0000000..d7927a8
--- /dev/null
+++ b/tests/dot-vs-NUL-and-NL
@@ -0,0 +1,31 @@
+#!/bin/sh
+# Ensure that the match-any "." pattern does not match "\0", and
+# does match "\n" with -z.
+: ${srcdir=.}
+. "$srcdir/init.sh"; path_prepend_ ../src
+
+require_en_utf8_locale_
+
+printf '\n' > nl || framework_failure_
+printf '\0' > nul || framework_failure_
+fail=0
+
+for loc in en_US.UTF-8 C; do
+
+ # "." must not match "\0"
+ LC_ALL=$loc POSIXLY_CORRECT=1 grep -l . nul > out 2>&1
+ # Expect no match and no output.
+ test $? = 1 || fail=1
+ compare out /dev/null || fail=1
+
+ # In general, "." must not match "\n".
+ LC_ALL=$loc grep -l . nl > out
+ test $? = 1 || fail=1
+ compare out /dev/null || fail=1
+
+ # However, "." *does* match "\n" when "\0" is the input record delimiter.
+ LC_ALL=$loc grep -zl . nl > out || fail=1
+
+done
+
+Exit $fail
--
1.7.1.rc2.265.g8743f