bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#38656: [PATCH] grep: do not match invalid UTF-8


From: Paul Eggert
Subject: bug#38656: [PATCH] grep: do not match invalid UTF-8
Date: Tue, 17 Dec 2019 22:05:19 -0800

Update Gnulib to latest.  Also:
* src/dfasearch.c (EGexecute): Use ptrdiff_t, not size_t,
to match new Gnulib API.
* tests/Makefile.am (TESTS): Add dfa-invalid-utf8.
* tests/dfa-invalid-utf8: New file.
---
 NEWS                   |  5 ++++-
 gnulib                 |  2 +-
 src/dfasearch.c        |  2 +-
 tests/Makefile.am      |  1 +
 tests/dfa-invalid-utf8 | 29 +++++++++++++++++++++++++++++
 5 files changed, 36 insertions(+), 3 deletions(-)
 create mode 100755 tests/dfa-invalid-utf8

diff --git a/NEWS b/NEWS
index b106e2f..b6ff57c 100644
--- a/NEWS
+++ b/NEWS
@@ -9,7 +9,10 @@ GNU grep NEWS                                    -*- outline 
-*-
 
 ** Bug fixes
 
-  grep -Fw can no longer false match in non-UTF8 multibyte locales
+  '.' no longer matches some invalid byte sequences in UTF-8 locales.
+  [bug introduced in grep 2.7]
+
+  grep -Fw can no longer false match in non-UTF-8 multibyte locales
   For example, this command would erroneously print its input line:
     echo ab | LC_CTYPE=ja_JP.eucjp grep -Fw b
   [Bug#38223 introduced in grep 2.28]
diff --git a/gnulib b/gnulib
index b7bf9f4..1219c34 160000
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit b7bf9f4361c8d78ccfda7a30ff31f7a406ea972e
+Subproject commit 1219c343014ede881069bab554408b40e5455d9c
diff --git a/src/dfasearch.c b/src/dfasearch.c
index 6c95d8c..153281d 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -234,7 +234,7 @@ EGexecute (void *vdc, char const *buf, size_t size, size_t 
*match_size,
       if (!start_ptr)
         {
           char const *next_beg, *dfa_beg = beg;
-          size_t count = 0;
+          ptrdiff_t count = 0;
           bool exact_kwset_match = false;
           bool backref = false;
 
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 82aebbf..dee6f46 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -86,6 +86,7 @@ TESTS =                                               \
   dfa-coverage                                 \
   dfa-heap-overrun                             \
   dfa-infloop                                  \
+  dfa-invalid-utf8                             \
   dfaexec-multibyte                            \
   empty                                                \
   empty-line                                   \
diff --git a/tests/dfa-invalid-utf8 b/tests/dfa-invalid-utf8
new file mode 100755
index 0000000..1748043
--- /dev/null
+++ b/tests/dfa-invalid-utf8
@@ -0,0 +1,29 @@
+#! /bin/sh
+# Test whether "grep '.'" matches invalid UTF-8 byte sequences.
+#
+# Copyright 2019 Free Software Foundation, Inc.
+#
+# Copying and distribution of this file, with or without modification,
+# are permitted in any medium without royalty provided the copyright
+# notice and this notice are preserved.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+require_en_utf8_locale_
+require_compiled_in_MB_support
+
+fail=0
+
+printf 'a\360\202\202\254b\n' >in1 || framework_failure_
+LC_ALL=en_US.UTF-8 grep 'a.b' in1 > out1 2> err
+test $? -eq 1 || fail=1
+compare /dev/null out1 || fail=1
+compare /dev/null err1 || fail=1
+
+printf 'a\360\202\202\254ba\360\202\202\254b\n' >in2 ||
+  framework_failure_
+LC_ALL=en_US.UTF-8 grep -E '(a.b)\1' in2 > out2 2> err
+test $? -eq 1 || fail=1
+compare /dev/null out2 || fail=1
+compare /dev/null err2 || fail=1
+
+Exit $fail
-- 
2.17.1






reply via email to

[Prev in Thread] Current Thread [Next in Thread]