[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
From: |
Bruno Haible |
Subject: |
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X |
Date: |
Thu, 18 Jan 2007 03:14:37 +0100 (MET) |
User-agent: |
KMail/1.5.4 |
Vincent Lefevre wrote:
> > Therefore: can you also show wrong behaviour when you set
> > LC_ALL=en_US.UTF-8 ?
>
> Yes:
>
> prunille:~/blah> export LC_ALL=en_US.UTF-8
> prunille:~/blah> locale
> LANG="POSIX"
> LC_COLLATE="en_US.UTF-8"
> LC_CTYPE="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_ALL="en_US.UTF-8"
> prunille:~/blah> ls
> É y123456789012345678901234567890
> x123456789012345678901234567890 z123456789012345678901234567890
On MacOS X 10.3.9 I can reproduce this. Let's look at the hexdump of
ls' output:
1) In an Apple Terminal
2) In an xterm, launched with "LC_ALL=en_US.UTF-8 xterm"
3) In an xterm running on Linux, with an ssh to MacOS X
In all three cases the output of ls is the same:
$ LC_ALL=en_US.UTF-8 ls -C | hd
000000 45 CC 81 09 09 09 09 20 79 31 32 33 34 35 36 37 E...... y1234567
000010 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 8901234567890123
000020 34 35 36 37 38 39 30 0A 78 31 32 33 34 35 36 37 4567890.x1234567
000030 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 8901234567890123
000040 34 35 36 37 38 39 30 20 20 7A 31 32 33 34 35 36 4567890 z123456
000050 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 7890123456789012
000060 33 34 35 36 37 38 39 30 0A 34567890.
You see, it starts with E, the accent - on MacOS X, filenames are
represented in decomposed Unicode form -, 4 tabs and a space. So that
the second column of filenames should start in screen column 33 (where
the leftmost is screen column 0). But the output in the terminal looks
like this:
1) In an Apple Terminal
É y123456789012345678901234567890
x123456789012345678901234567890 z123456789012345678901234567890
2), 3)
É y123456789012345678901234567890
x123456789012345678901234567890 z123456789012345678901234567890
So what you see is that Apple Terminal has problems knowing the width
of combining characters like accents when it expands tabs. If you
tell 'ls' to emit spaces instead of tabs, like this:
ls -C -T0
or
TABSIZE=0 ls -C
then the output looks the same in all kinds of terminals.
Conclusion: What you see is not an ls bug, but an Apple Terminal bug
with tabs.
But there is an ls bug:
$ ls -C -T0
É y123456789012345678901234567890
x123456789012345678901234567890 z123456789012345678901234567890
$ ls -C -T0 | hd
000000 45 CC 81 20 20 20 20 20 20 20 20 20 20 20 20 20 E..
000010 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
000020 20 20 79 31 32 33 34 35 36 37 38 39 30 31 32 33 y1234567890123
000030 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 4567890123456789
000040 30 0A 78 31 32 33 34 35 36 37 38 39 30 31 32 33 0.x1234567890123
000050 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 4567890123456789
000060 30 20 20 7A 31 32 33 34 35 36 37 38 39 30 31 32 0 z123456789012
000070 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 3456789012345678
000080 39 30 0A 90.
What 'ls' here outputs is: an E, a combining accent and 31 spaces - text
that moves to column 32, not 33. When I set a breakpoint in wcwidth,
I see that the first call to wcwidth() gives: wcwidth(0x0301) = 1.
U+0301 is COMBINING ACUTE ACCENT. So here is the problem: MacOS'
wcwidth is buggy for combining characters like accents.
Bruno
(*) 'hd' is a shell script:
#!/bin/sh
hexdump -e '"%06.6_ax " 16/1 "%02X "' -e '" " 16/1 "%_p" "\n"' "$@"
Message not available
- Message not available
- Re: Alignment bug in ls with UTF-8 filenames under Mac OS X,
Bruno Haible <=
- Re: Alignment bug in ls with UTF-8 filenames under Mac OS X, Vincent Lefevre, 2007/01/17
- Re: Alignment bug in ls with UTF-8 filenames under Mac OS X, Bruno Haible, 2007/01/18
- Re: Alignment bug in ls with UTF-8 filenames under Mac OS X, Jim Meyering, 2007/01/18
- Re: Alignment bug in ls with UTF-8 filenames under Mac OS X, Bruno Haible, 2007/01/18
- Re: Alignment bug in ls with UTF-8 filenames under Mac OS X, Jim Meyering, 2007/01/18
- Re: Alignment bug in ls with UTF-8 filenames under Mac OS X, Bruno Haible, 2007/01/18
- Re: Alignment bug in ls with UTF-8 filenames under Mac OS X, Jim Meyering, 2007/01/18
- Re: Alignment bug in ls with UTF-8 filenames under Mac OS X, Bruno Haible, 2007/01/18
- Re: Alignment bug in ls with UTF-8 filenames under Mac OS X, Vincent Lefevre, 2007/01/18
- Re: Alignment bug in ls with UTF-8 filenames under Mac OS X, Jim Meyering, 2007/01/19