bug#60697: GNU grep mishandles \b near encoding errors

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#60697: GNU grep mishandles \b near encoding errors

From:	Jim Meyering
Subject:	bug#60697: GNU grep mishandles \b near encoding errors
Date:	Wed, 11 Jan 2023 22:03:52 -0800

On Mon, Jan 9, 2023 at 10:16 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
> Here's a shell session illustrating the problem on Fedora 37, which has
> GNU grep 3.7. The same bug is still in bleeding-edge GNU grep.
>
>    $ export LC_ALL=en_US.utf8
>    $ printf '\300\n' | grep '\b'
>    grep: (standard input): binary file matches
>    $ printf '\300\n' | grep -P '\b'
>    $
>
> Plain grep finds a word boundary in the input even though the input
> contains no words (just an encoding error). 'grep -P' does the right thing.
>
> The underlying issue is in the glibc regex code so the fix should be in
> glibc / Gnulib, but I thought I'd report it here before I forgot it.

Thanks! While this would definitely be nice to fix before the release
(in the next week or so), it's enough of a corner case that I wouldn't
feel bad releasing without a fix.

For the record, this problem first arose in grep-2.19.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#60697: GNU grep mishandles \b near encoding errors, Paul Eggert, 2023/01/09
- bug#60697: GNU grep mishandles \b near encoding errors, Jim Meyering <=

Prev by Date: bug#60708: pcre: improve support for linking with a library without unicode
Next by Date: bug#60708: pcre: improve support for linking with a library without unicode
Previous by thread: bug#60697: GNU grep mishandles \b near encoding errors
Next by thread: bug#60708: pcre: improve support for linking with a library without unicode
Index(es):
- Date
- Thread