[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Why does dired go through extra efforts to avoid unibyte names
From: |
Stefan Monnier |
Subject: |
Re: Why does dired go through extra efforts to avoid unibyte names |
Date: |
Tue, 02 Jan 2018 23:14:20 -0500 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) |
>> I bumped into the following code in dired-get-filename:
>>
>> ;; The above `read' will return a unibyte string if FILE
>> ;; contains eight-bit-control/graphic characters.
>> (if (and enable-multibyte-characters
>> (not (multibyte-string-p file)))
>> (setq file (string-to-multibyte file)))
>>
>> and I'm wondering why we don't want a unibyte string here.
>> `vc-region-history` told me this comes from the commit appended below,
>> which seems to indicate that we're worried about a subsequent encoding,
>> but AFAIK unibyte file names are not (re)encoded, and passing them
>> through string-to-multibyte would actually make things worse in this
>> respect (since it might cause the kind of (re)encoding this is
>> supposedly trying to avoid).
>>
>> What am I missing?
>
> Why does it matter whether eight-bit-* characters are encoded one more
> or one less time?
That's part of the question, indeed.
> As for the reason for using string-to-multibyte: maybe it's because we
> use concat further down in the function, which will determine whether
> the result will be unibyte or multibyte according to its own ideas of
> what's TRT?
But `concat` will do a string-to-multibyte for us, if needed, so
that doesn't seem like a good reason.
This said, when that code was written, maybe `concat` used
string-make-multibyte internally instead, so this call to
string-to-multibyte might have been added to avoid using
string-make-multibyte inside `concat`?
It would be good to have a concrete case that needed the above code, to
see if the problem still exists.
Stefan
- Re: Why does dired go through extra efforts to avoid unibyte names,
Stefan Monnier <=