bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] normalization tweaks for macOS


From: Grisha Levit
Subject: [PATCH] normalization tweaks for macOS
Date: Fri, 7 Jul 2023 17:05:24 -0400

A few small tweaks for the macOS-specific normalization handling to
handle the issues below:

Examples below are using the following setup:
$ mkdir -p -- $'\303\251-C' $'e\314\201-D'
$ /bin/ls -BF
é-C/   é-D/
$ LC_ALL=C /bin/ls -BF
e\314\201-D/ \303\251-C/

LC_ALL=C just to make the output clear, the shell's locale is UTF-8.
This is on APFS, which is normalization-preserving, but similar
results show up on HFS+ (which is not).  (In either case, access is
normalization insensitive).
---

When attempting to match a direntry name to the supplied pattern, the
globbing code converts the name from UTF-8-MAC encoding to (say)
UTF-8.  This is essentially an NFC normalization.  If the converted
name matches, the original name is then (correctly, I think) returned
by the glob operation:
$ LC_ALL=C printf '%q ' $'\303\251'*
$'\303\251-C' $'e\314\201-D'

However, the pattern itself does not go through normalization and
unless it is already NFC-normalized, it won't match anything.  So even
names retrieved from globbing might not match themselves:
$ for x in *; { echo "$x:" "$x"*/; }
é-C: é-C/
é-D:

Seems like it would be appropriate to normalize the pattern too.

One maybe tricky thing here is that quoted characters in the pattern
are supplied to the globbing code prefixed by a backslash, making
normalization fail to combine combining characters.  It's possible to
adjust quote_string_for_globbing to only put in backslashes when the
quoted character is special for globbing but that might complicate the
code more than necessary -- I kind of cheated by just not adding a
backslash if the quoted character is non-ASCII.  I can't think of any
way a non-ASCII character can be special in globbing code but maybe
I'm not trying hard enough.
---

Filename completion has a similar situation. The NFC form matches any
text that normalizes to it:
$ bash -in <<<$'\303\251\e*'
bash-5.3$ é-C é-D

But NFD text matches nothing, not even itself:
$ bash -in <<<$'e\314\201\e*'
bash-5.3$ é

Admittedly, filename completion does not itself produce non-NFC text
so it's less likely that this would be encountered, but normalizing
the hint text before comparing it to normalized filenames seems easy
enough.

BTW glob-expand-word can result in NFD text on the input line but that
seems correct since it's what globbing produces.
---

If filename completion is invoked through `compgen', it behaves
differently in scripts since bashline.c:initialize_readline hasn't had
a chance to set rl_filename_rewrite_hook.

$ bash -c $'compgen -f -- \303\251'
é-C
$ bash -c $'compgen -f -- e'
é-D

This can be worked around by calling `bind' manually, resulting in the
same behavior as in an interactive shell:
$ bash -c $'bind; compgen -f \303\251'
é-C
é-D
$ bash -c $'bind; compgen -f -- e'
$

..but seems safe enough to set the hook from compgen directly as well.

Attachment: 0001-fnxform-tweaks.patch
Description: Source code patch


reply via email to

[Prev in Thread] Current Thread [Next in Thread]