[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposed alternative encoding for stray UTF-8 bytes in strings
From: |
felix . winkelmann |
Subject: |
Re: Proposed alternative encoding for stray UTF-8 bytes in strings |
Date: |
Tue, 28 Nov 2023 13:23:08 +0100 |
> Yes, this is precisely my point - 'one or more'. The string-length with
> invalid embedded sequences is not guaranteed to be consistent, which seems
> like a problem. Doing a decode to ensure all points are valid - even if in
> the undefined sequences - seems to be a good idea to prevent secondary issues.
The validation is done in "utf8->string". Once a string from some other,
unknown source has
been created as an internal string object, any subsequent modifications will use
valid UTF-8 sequences, unless you explicitly inject U+DCxx characters (the
latter
should probably be disallowed).
felix