[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: More confusion about multibyte vs unibyte strings
From: |
Eric Abrahamsen |
Subject: |
Re: More confusion about multibyte vs unibyte strings |
Date: |
Thu, 05 May 2022 17:45:36 -0700 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) |
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Eric Abrahamsen <eric@ericabrahamsen.net>
>> Date: Thu, 05 May 2022 11:44:41 -0700
>>
>> > Why does it "mess things up", and what exactly is the nature of the
>> > mess-up? A pure-ASCII string can be either unibyte or multibyte, and
>> > that shouldn't change a thing.
>>
>> If the string is not ASCII, we need to encode it before sending to the
>> server, and tell the server what encoding we used. Microsoft Exchange
>> servers can't handle any encoding other than ascii.
>
> What do you mean by "ascii encoding" in this context?
>
> When you say that Microsoft Exchange can't handle any encoding other
> than ascii, does it mean it cannot handle _any_ non-ASCII addressee
> names? That'd be hard to believe, because such addressee names are
> nowadays in wide use. So I guess you mean something else, but what?
The IMAP search command can look like "UID SEARCH", or "UID SEARCH
CHARSET XXX". Specifying no charset is (I think) the same as specifying
US-ASCII, which is the only charset that Exchange accepts for the search
command.
If the search string is multibyte (in my mind this means "multiple bytes
per character", I guess that's where I went wrong), you have to encode
it as something, tell the server what charset you used to encode it,
then send both the encoded string and the number of bytes it represents.
The gnus-search code encodes it as emacs-utf-8, and then sends UID
SEARCH CHARSET UTF-8, which Exchange won't accept.
>> So if our code thinks a string isn't ascii, it sends the encoding
>> message to the IMAP server, and Exchange blows up.
>
> Encoding ascii yields a string that is identical to the original (IIUC
> what you mean by "encoding"), so I don't follow you here.
>
>> If the string is ascii, we don't try to encode it, and everything's
>> fine. So I need to know whether the string is actually ascii or not.
>
> You can do that using the regexp class [:ascii:], I guess.
That's how I'll solve it, then.
- More confusion about multibyte vs unibyte strings, Eric Abrahamsen, 2022/05/05
- Re: More confusion about multibyte vs unibyte strings, Eli Zaretskii, 2022/05/05
- Re: More confusion about multibyte vs unibyte strings, Eric Abrahamsen, 2022/05/05
- Re: More confusion about multibyte vs unibyte strings, Eli Zaretskii, 2022/05/05
- Re: More confusion about multibyte vs unibyte strings,
Eric Abrahamsen <=
- Re: More confusion about multibyte vs unibyte strings, Stefan Monnier, 2022/05/05
- Re: More confusion about multibyte vs unibyte strings, Eric Abrahamsen, 2022/05/06
- Re: More confusion about multibyte vs unibyte strings, Stefan Monnier, 2022/05/06
- Re: More confusion about multibyte vs unibyte strings, Eric Abrahamsen, 2022/05/06