[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#38656: [PATCH] grep: do not match invalid UTF-8
From: |
Paul Eggert |
Subject: |
bug#38656: [PATCH] grep: do not match invalid UTF-8 |
Date: |
Tue, 17 Dec 2019 22:05:19 -0800 |
Update Gnulib to latest. Also:
* src/dfasearch.c (EGexecute): Use ptrdiff_t, not size_t,
to match new Gnulib API.
* tests/Makefile.am (TESTS): Add dfa-invalid-utf8.
* tests/dfa-invalid-utf8: New file.
---
NEWS | 5 ++++-
gnulib | 2 +-
src/dfasearch.c | 2 +-
tests/Makefile.am | 1 +
tests/dfa-invalid-utf8 | 29 +++++++++++++++++++++++++++++
5 files changed, 36 insertions(+), 3 deletions(-)
create mode 100755 tests/dfa-invalid-utf8
diff --git a/NEWS b/NEWS
index b106e2f..b6ff57c 100644
--- a/NEWS
+++ b/NEWS
@@ -9,7 +9,10 @@ GNU grep NEWS -*- outline
-*-
** Bug fixes
- grep -Fw can no longer false match in non-UTF8 multibyte locales
+ '.' no longer matches some invalid byte sequences in UTF-8 locales.
+ [bug introduced in grep 2.7]
+
+ grep -Fw can no longer false match in non-UTF-8 multibyte locales
For example, this command would erroneously print its input line:
echo ab | LC_CTYPE=ja_JP.eucjp grep -Fw b
[Bug#38223 introduced in grep 2.28]
diff --git a/gnulib b/gnulib
index b7bf9f4..1219c34 160000
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit b7bf9f4361c8d78ccfda7a30ff31f7a406ea972e
+Subproject commit 1219c343014ede881069bab554408b40e5455d9c
diff --git a/src/dfasearch.c b/src/dfasearch.c
index 6c95d8c..153281d 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -234,7 +234,7 @@ EGexecute (void *vdc, char const *buf, size_t size, size_t
*match_size,
if (!start_ptr)
{
char const *next_beg, *dfa_beg = beg;
- size_t count = 0;
+ ptrdiff_t count = 0;
bool exact_kwset_match = false;
bool backref = false;
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 82aebbf..dee6f46 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -86,6 +86,7 @@ TESTS = \
dfa-coverage \
dfa-heap-overrun \
dfa-infloop \
+ dfa-invalid-utf8 \
dfaexec-multibyte \
empty \
empty-line \
diff --git a/tests/dfa-invalid-utf8 b/tests/dfa-invalid-utf8
new file mode 100755
index 0000000..1748043
--- /dev/null
+++ b/tests/dfa-invalid-utf8
@@ -0,0 +1,29 @@
+#! /bin/sh
+# Test whether "grep '.'" matches invalid UTF-8 byte sequences.
+#
+# Copyright 2019 Free Software Foundation, Inc.
+#
+# Copying and distribution of this file, with or without modification,
+# are permitted in any medium without royalty provided the copyright
+# notice and this notice are preserved.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+require_en_utf8_locale_
+require_compiled_in_MB_support
+
+fail=0
+
+printf 'a\360\202\202\254b\n' >in1 || framework_failure_
+LC_ALL=en_US.UTF-8 grep 'a.b' in1 > out1 2> err
+test $? -eq 1 || fail=1
+compare /dev/null out1 || fail=1
+compare /dev/null err1 || fail=1
+
+printf 'a\360\202\202\254ba\360\202\202\254b\n' >in2 ||
+ framework_failure_
+LC_ALL=en_US.UTF-8 grep -E '(a.b)\1' in2 > out2 2> err
+test $? -eq 1 || fail=1
+compare /dev/null out2 || fail=1
+compare /dev/null err2 || fail=1
+
+Exit $fail
--
2.17.1
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#38656: [PATCH] grep: do not match invalid UTF-8,
Paul Eggert <=