bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#56469: 29.0.50; Unibyte dir in directory_files_internal


From: Eli Zaretskii
Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal
Date: Sun, 10 Jul 2022 17:32:17 +0300

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: 56469@debbugs.gnu.org
> Date: Sun, 10 Jul 2022 10:23:28 -0400
> 
> W.r.t to the comment, it's indeed unrelated to the patch (other than
> the fact that it touches the same code).  The question is when we do:
> 
>         finalname = (nchars == nbytes)
>                     ? make_uninit_string (nbytes)
>                     : make_uninit_multibyte_string (nchars, nbytes);
> 
> the actual bytes are "decoded" (i.e. in our internal UTF-8 encoding), so
> (nchars == nbytes) checks whether its "pure ASCII" or not and if it's
> pure ASCII we return a unibyte string.

I don't think this is true, because early during startup we don't yet
have the coding-systems set up, and so the file names are unibyte and
undecoded.  So that place in dired.c doesn't only handle ASCII when it
sees that ncahrs == nbytes.

> So in the above code snippet, when the string is all-ASCII, we actually
> have a choice, and both a unibyte string and a multibyte string should
> work.  Currently in that case we return a unibyte string, but I think in
> such cases we're better off returning a multibyte string because the
> subsequent "all-ASCII" test (that DE/ENCODE_FILE will perform when we
> pass that filename to some further operation) will be more efficient
> (it's a constant-time (nchars == nbytes) test whereas when the string is
> unibyte it requires looking at each and every byte).
> 
> IOW, while it makes sense to return a "decoded unibyte" string from
> DECODE_FILE in order to avoid an allocation, I don't think it makes
> sense to return such a "decoded unibyte" string when we have to allocate
> a new string anyway.

I'm not necessarily opposed to decide that ASCII strings should be
multibyte, but doing so for file names will need careful auditing of
the sources with the startup process in mind.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]