Re: Proposed alternative encoding for stray UTF-8 bytes in strings

chicken-hackers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposed alternative encoding for stray UTF-8 bytes in strings

From:	felix . winkelmann
Subject:	Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Date:	Tue, 28 Nov 2023 13:23:08 +0100

> Yes, this is precisely my point - 'one or more'. The string-length with 
> invalid embedded sequences is not guaranteed to be consistent, which seems 
> like a problem. Doing a decode to ensure all points are valid - even if in 
> the undefined sequences - seems to be a good idea to prevent secondary issues.

The validation is done in "utf8->string". Once a string from some other, 
unknown source has
been created as an internal string object, any subsequent modifications will use
valid UTF-8 sequences, unless you explicitly inject U+DCxx characters (the 
latter
should probably be disallowed).


felix

[Prev in Thread]

Current Thread

[Next in Thread]

Proposed alternative encoding for stray UTF-8 bytes in strings, John Cowan, 2023/11/24
- Re: Proposed alternative encoding for stray UTF-8 bytes in strings, felix . winkelmann, 2023/11/27
  - Re: Proposed alternative encoding for stray UTF-8 bytes in strings, elf, 2023/11/27
  - Re: Proposed alternative encoding for stray UTF-8 bytes in strings, felix . winkelmann, 2023/11/27
    - Re: Proposed alternative encoding for stray UTF-8 bytes in strings, elf, 2023/11/27
    - Re: Proposed alternative encoding for stray UTF-8 bytes in strings, felix . winkelmann <=

Prev by Date: Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Previous by thread: Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Index(es):
- Date
- Thread