Re: Proposed alternative encoding for stray UTF-8 bytes in strings

chicken-hackers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposed alternative encoding for stray UTF-8 bytes in strings

From:	elf
Subject:	Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Date:	Mon, 27 Nov 2023 15:58:23 +0200
User-agent:	K-9 Mail for Android

Yes, this is precisely my point - 'one or more'. The string-length with invalid 
embedded sequences is not guaranteed to be consistent, which seems like a 
problem. Doing a decode to ensure all points are valid - even if in the 
undefined sequences - seems to be a good idea to prevent secondary issues.

I take your point that the string-copy would not be affected, though, thank you.

-elf

On 27 November 2023 15:41:59 GMT+02:00, felix.winkelmann@bevuta.com wrote:
>> Question: if there is no translation at all, won't the invalid chars cause 
>> issues with things like string-length and string-copy procs? That is, since 
>> the number of octets can't be correctly translated to a number of glyphs, 
>> there will be some unpleasant side effects.
>
>Converting a octet-sequence to a string involves a decoding step to compute 
>the length.
>Any invalid embedded UTF-8 sequence is taken as one ore more "illegal" 
>code-points,
>counting for one ore more characters in the final string length. Note that the 
>length
>of the "backing store" bytevector for the string is retained together with the 
>number of
>code-points that the string holds (the former is stored in the header of the 
>string's
>bytevector buffer, the latter in a slot of the string).
>
>
>felix
>
>

[Prev in Thread]

Current Thread

[Next in Thread]

Proposed alternative encoding for stray UTF-8 bytes in strings, John Cowan, 2023/11/24
- Re: Proposed alternative encoding for stray UTF-8 bytes in strings, felix . winkelmann, 2023/11/27
  - Re: Proposed alternative encoding for stray UTF-8 bytes in strings, elf, 2023/11/27
  - Re: Proposed alternative encoding for stray UTF-8 bytes in strings, felix . winkelmann, 2023/11/27
    - Re: Proposed alternative encoding for stray UTF-8 bytes in strings, elf <=
    - Re: Proposed alternative encoding for stray UTF-8 bytes in strings, felix . winkelmann, 2023/11/28

Prev by Date: Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Next by Date: Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Previous by thread: Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Next by thread: Re: Proposed alternative encoding for stray UTF-8 bytes in strings
Index(es):
- Date
- Thread