[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 4/4] dfa: do not match invalid UTF-8
From: |
Bruno Haible |
Subject: |
Re: [PATCH 4/4] dfa: do not match invalid UTF-8 |
Date: |
Wed, 18 Dec 2019 09:48:01 +0100 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-166-generic; KDE/5.18.0; x86_64; ; ) |
Hi Paul,
> (add_utf8_anychar): Match only valid UTF-8 byte sequences
> instead of allowing overlong encodings or surrogate halves.
Do I understand it correctly that, as a consequence of this change,
'grep' with a regex of '^.*$' will no longer match lines which contains
an invalid UTF-8 byte sequence?
If so:
- Is this effect on 'grep' intended? (And the workaround is to use the
"C" locale.)
- Is it consistent with the behaviour of regex and kwset, which 'grep'
also uses, depending on the arguments (as far as I understand)?
Bruno