bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wildcard expansion can fail with nonprinting characters


From: Stephane Chazelas
Subject: Re: Wildcard expansion can fail with nonprinting characters
Date: Tue, 1 Oct 2019 07:44:20 +0100
User-agent: NeoMutt/20171215

2019-09-30 15:35:21 -0400, Chet Ramey:
[...]
> The $'\361' is a unicode combining
> character, which ends up making the entire sequence of characters an
> invalid wide character string in a bunch of different locales.
[...]

No, $'\u0361', the unicode character 0x361 (hex) is "COMBINING
DOUBLE INVERTED BREVE" (encoded as \315\241 in UTF-8)

But $'\361' is byte value 0361 (octal). In UTF-8, on its own
it's an invalid byte sequence. That's 2#11110001, which would be
the first byte of a 4 byte-long character (of characters U+40000
to U+7FFFF). In latin1, that's ñ (LATIN SMALL LETTER N WITH
TILDE).

So $'foo\361bar' is not text in UTF-8, but that's an encoding
issue, not a problem with combining characters.

$ locale charmap
UTF-8
$ printf '\u361' | od -An -to1
 315 241
$ printf '\U40000' | od -An -vto1
 361 200 200 200
$ printf 'foo\361bar' | iconv -f utf8
fooiconv: illegal input sequence at position 3

-- 
Stephane




reply via email to

[Prev in Thread] Current Thread [Next in Thread]