grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grep branch, master, updated. v3.3-85-gde6f36d


From: Paul Eggert
Subject: grep branch, master, updated. v3.3-85-gde6f36d
Date: Wed, 9 Sep 2020 15:44:41 -0400 (EDT)

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".

The branch, master has been updated
       via  de6f36d9b6d702b14ac4ee58dfbcab740c7ca749 (commit)
      from  1021a92aa915ac500b2be267dde6acf342b86038 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=de6f36d9b6d702b14ac4ee58dfbcab740c7ca749


commit de6f36d9b6d702b14ac4ee58dfbcab740c7ca749
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Wed Sep 9 12:43:11 2020 -0700

    grep: fix -w bug in UTF-8 locales
    
    Problem reported by Mayo Fark (Bug#43225).
    * src/searchutils.c (wordchar_prev): In a UTF-8 locale, do not
    assume that an encoding-error byte cannot be part of a word
    constituent, as this assumption is incorrect for the last byte
    of a multibyte word constituent.
    * tests/word-delim-multibyte: Add a test for the bug.

diff --git a/NEWS b/NEWS
index acd95dd..28c7835 100644
--- a/NEWS
+++ b/NEWS
@@ -11,6 +11,10 @@ GNU grep NEWS                                    -*- outline 
-*-
 
 ** Bug fixes
 
+  In UTF-8 locales, grep -w no longer ignores a multibyte word
+  constituent just before what would otherwise be a word match.
+  [Bug#43225 introduced in grep 2.28]
+
   A performance regression with many duplicate patterns has been fixed.
   [Bug#43040 introduced in grep 3.4]
 
diff --git a/src/searchutils.c b/src/searchutils.c
index 84c319c..c4bb802 100644
--- a/src/searchutils.c
+++ b/src/searchutils.c
@@ -195,7 +195,7 @@ wordchar_prev (char const *buf, char const *cur, char const 
*end)
     return 0;
   unsigned char b = *--cur;
   if (! localeinfo.multibyte
-      || (localeinfo.using_utf8 && localeinfo.sbclen[b] != -2))
+      || (localeinfo.using_utf8 && localeinfo.sbclen[b] == 1))
     return sbwordchar[b];
   char const *p = buf;
   cur -= mb_goback (&p, NULL, cur, end);
diff --git a/tests/word-delim-multibyte b/tests/word-delim-multibyte
index 7d2c433..31190ad 100755
--- a/tests/word-delim-multibyte
+++ b/tests/word-delim-multibyte
@@ -34,4 +34,12 @@ for locale in C en_US.UTF-8; do
   compare /dev/null err || fail=1
 done
 
+# Bug#43255
+printf 'a \303\255cone b\n' >in
+for flag in '' -i; do
+  returns_ 1 env LC_ALL=en_US.UTF-8 grep -w $flag cone in >out 2>err || fail=1
+  compare /dev/null out || fail=1
+  compare /dev/null err || fail=1
+done
+
 Exit $fail

-----------------------------------------------------------------------

Summary of changes:
 NEWS                       | 4 ++++
 src/searchutils.c          | 2 +-
 tests/word-delim-multibyte | 8 ++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)


hooks/post-receive
-- 
grep



reply via email to

[Prev in Thread] Current Thread [Next in Thread]