guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile can't find a chinese named file


From: Marko Rauhamaa
Subject: Re: guile can't find a chinese named file
Date: Thu, 16 Feb 2017 14:14:41 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)

David Kastrup <address@hidden>:

> Marko Rauhamaa <address@hidden> writes:
>> And the point of bringing concatenation into the discussion was that
>> remapping byte sequences to byte sequences breaks concatenation
>> additivity:
>>
>>    U(x) + U(y) = U(x + y)
>
> But Emacs' implementation doesn't in any respect "break concatenation
> additivity".
>
> If you split an arbitrary byte stream (including material invalid as
> UTF-8) at an arbitrary point (including in the middle of an UTF-8
> character), decode the resulting pieces as UTF-8 (as one of several
> "reversible" encodings Emacs can interpret), concatenate the resulting
> Emacs strings and reencode the result as UTF-8 (since you actually
> need to provide a byte sequence to open(1) or similar), you will
> retain the original byte stream. No ifs and buts.
>
> The _decoded_ concatenated string might differ from decoding the
> unsplit byte string: it might contain "byte 0xc2, byte 0x80"
> (represented as 0xc1 0x82 0xc0 0x80) at the concatenation point rather
> than "character 0x80" (represented as 0xc2 0x80). But the moment you
> use this concatenation of half-sequences as a file name, it gets
> reencoded into the bytes 0xc2 and 0x80 and works just fine.

That is already a lot, maybe even enough.

(On the other side of the equation, expressing a filename in Unicode may
not produce an unambiguous code point sequence... <URL:
http://unicode.org/faq/normalization.html>)


Marko



reply via email to

[Prev in Thread] Current Thread [Next in Thread]