Re: [PATCH 4/4] dfa: do not match invalid UTF-8

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 4/4] dfa: do not match invalid UTF-8

From:	Bruno Haible
Subject:	Re: [PATCH 4/4] dfa: do not match invalid UTF-8
Date:	Wed, 18 Dec 2019 09:48:01 +0100
User-agent:	KMail/5.1.3 (Linux/4.4.0-166-generic; KDE/5.18.0; x86_64; ; )

Hi Paul,

> (add_utf8_anychar): Match only valid UTF-8 byte sequences
> instead of allowing overlong encodings or surrogate halves.

Do I understand it correctly that, as a consequence of this change,
'grep' with a regex of '^.*$' will no longer match lines which contains
an invalid UTF-8 byte sequence?

If so:
  - Is this effect on 'grep' intended? (And the workaround is to use the
    "C" locale.)
  - Is it consistent with the behaviour of regex and kwset, which 'grep'
    also uses, depending on the arguments (as far as I understand)?

Bruno

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 1/4] dfa: tune via xzalloc, Paul Eggert, 2019/12/18
- [PATCH 2/4] fts: tune via calloc, Paul Eggert, 2019/12/18
- [PATCH 3/4] dfa: simplify charclass by assuming C99, Paul Eggert, 2019/12/18
  - Re: [PATCH 3/4] dfa: simplify charclass by assuming C99, Bruno Haible, 2019/12/18
- [PATCH 4/4] dfa: do not match invalid UTF-8, Paul Eggert, 2019/12/18
  - Re: [PATCH 4/4] dfa: do not match invalid UTF-8, Bruno Haible <=
    - Re: [PATCH 4/4] dfa: do not match invalid UTF-8, Paul Eggert, 2019/12/18

Prev by Date: Re: [PATCH 3/4] dfa: simplify charclass by assuming C99
Next by Date: Re: hard-locale: make multithread-safe
Previous by thread: [PATCH 4/4] dfa: do not match invalid UTF-8
Next by thread: Re: [PATCH 4/4] dfa: do not match invalid UTF-8
Index(es):
- Date
- Thread