[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: master d57bb0c: Treat passed strings as raw-text when percent-escapi
From: |
Robert Pluim |
Subject: |
Re: master d57bb0c: Treat passed strings as raw-text when percent-escaping in epg |
Date: |
Thu, 12 Dec 2019 16:19:46 +0100 |
>>>>> On Thu, 12 Dec 2019 08:58:33 -0500, Stefan Monnier <address@hidden> said:
Stefan> Hi Robert,
>> The strings contained in gpg keys can contain UTF-8 data, but can also
>> use percent-escapes to encode non-ASCII chars. When converting those
>> escapes, use 'raw-text' coding system rather than 'string-to-unibyte',
>> since the latter signals an error for non-ASCII characters.
Stefan> I don't quite understand: "can contain UTF-8 data" seems odd here
since
Stefan> you're calling `encode-coding-string` whose input argument is a
sequence
Stefan> of characters whereas "UTF-8 data" can only be found in sequences
of bytes.
Stefan> Did you mean "can contain non-ASCII characters"?
"can contain non-ASCII characters encoded using UTF-8", which means
they end up in a multi-byte string in emacs.
Stefan> The other problem with the above description is the "raw-text" since
Stefan> it's far from clear what it means (personally I really have no idea
Stefan> what is "raw text" and the way Emacs understands "raw text" is more
or
Stefan> less "EOL-separated lines of bytes" which does not seem to match
your
Stefan> description since string-to-unibyte doesn't signal errors when
Stefan> encountering bytes).
Itʼs replacing the use of string-to-unibyte on a multibyte string
containing non-ASCII characters, which signals an error, with
encode-coding-string using 'raw-text, which produces a bunch of
bytes. My other choices were 'binary or 'no-conversion, which do the
same, but have even less meaningful names.
Stefan> Looking at the code, I see that the only caller of
Stefan> `epg--decode-percent-escape` seems to be
Stefan> `epg--decode-percent-escape-utf-8` which decodes the bytes returned
by
Stefan> `epg--decode-percent-escape` using `utf-8` so I think it makes more
Stefan> sense to encode using `utf-8` than `raw-text`, WDYT?
No. The string that is passed to epg--decode-percent-escape can
contain non-ASCII characters encoded as UTF-8, plus percent-escaped
representations of non-ASCII characters. In order to convert those
percent-escaped characters correctly, the string has to be treated as
a unibyte array of bytes, then re-converted to multibyte by encoding
with utf-8 afterwards.
Robert