bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Alignment bug in ls with UTF-8 filenames under Mac OS X


From: Bruno Haible
Subject: Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Date: Thu, 18 Jan 2007 03:14:37 +0100 (MET)
User-agent: KMail/1.5.4

Vincent Lefevre wrote:
> > Therefore: can you also show wrong behaviour when you set
> > LC_ALL=en_US.UTF-8 ?
> 
> Yes:
> 
> prunille:~/blah> export LC_ALL=en_US.UTF-8
> prunille:~/blah> locale
> LANG="POSIX"
> LC_COLLATE="en_US.UTF-8"
> LC_CTYPE="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_ALL="en_US.UTF-8"
> prunille:~/blah> ls
> É                               y123456789012345678901234567890
> x123456789012345678901234567890  z123456789012345678901234567890

On MacOS X 10.3.9 I can reproduce this. Let's look at the hexdump of
ls' output:

1) In an Apple Terminal

2) In an xterm, launched with "LC_ALL=en_US.UTF-8 xterm"

3) In an xterm running on Linux, with an ssh to MacOS X

In all three cases the output of ls is the same:
$ LC_ALL=en_US.UTF-8 ls -C | hd
000000  45 CC 81 09 09 09 09 20 79 31 32 33 34 35 36 37  E...... y1234567
000010  38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33  8901234567890123
000020  34 35 36 37 38 39 30 0A 78 31 32 33 34 35 36 37  4567890.x1234567
000030  38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33  8901234567890123
000040  34 35 36 37 38 39 30 20 20 7A 31 32 33 34 35 36  4567890  z123456
000050  37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32  7890123456789012
000060  33 34 35 36 37 38 39 30 0A                       34567890.

You see, it starts with E, the accent - on MacOS X, filenames are
represented in decomposed Unicode form -, 4 tabs and a space. So that
the second column of filenames should start in screen column 33 (where
the leftmost is screen column 0). But the output in the terminal looks
like this:

1) In an Apple Terminal
É                               y123456789012345678901234567890
x123456789012345678901234567890  z123456789012345678901234567890

2), 3)
É                                y123456789012345678901234567890
x123456789012345678901234567890  z123456789012345678901234567890

So what you see is that Apple Terminal has problems knowing the width
of combining characters like accents when it expands tabs. If you
tell 'ls' to emit spaces instead of tabs, like this:
  ls -C -T0
or
  TABSIZE=0 ls -C
then the output looks the same in all kinds of terminals.

Conclusion: What you see is not an ls bug, but an Apple Terminal bug
with tabs.

But there is an ls bug:

$ ls -C -T0
É                               y123456789012345678901234567890
x123456789012345678901234567890  z123456789012345678901234567890
$ ls -C -T0 | hd
000000  45 CC 81 20 20 20 20 20 20 20 20 20 20 20 20 20  E..             
000010  20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20                  
000020  20 20 79 31 32 33 34 35 36 37 38 39 30 31 32 33    y1234567890123
000030  34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39  4567890123456789
000040  30 0A 78 31 32 33 34 35 36 37 38 39 30 31 32 33  0.x1234567890123
000050  34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39  4567890123456789
000060  30 20 20 7A 31 32 33 34 35 36 37 38 39 30 31 32  0  z123456789012
000070  33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38  3456789012345678
000080  39 30 0A                                         90.

What 'ls' here outputs is: an E, a combining accent and 31 spaces - text
that moves to column 32, not 33. When I set a breakpoint in wcwidth,
I see that the first call to wcwidth() gives: wcwidth(0x0301) = 1.
U+0301 is COMBINING ACUTE ACCENT. So here is the problem: MacOS'
wcwidth is buggy for combining characters like accents.

Bruno


(*) 'hd' is a shell script:
#!/bin/sh
hexdump -e '"%06.6_ax  " 16/1 "%02X "' -e '"  " 16/1 "%_p" "\n"' "$@"





reply via email to

[Prev in Thread] Current Thread [Next in Thread]