[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core
From: |
Ralph Corderoy |
Subject: |
Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core |
Date: |
Tue, 08 Aug 2017 17:07:51 +0100 |
Hi David,
> > Sunglasses have a width of 1 here, that's why David and I don't see
> > the problem.
>
> I'm surprised that I didn't see the same behavior as Norm, because we
> use the same locale, en_US.utf8. Any idea why?
I'm en_GB.utf8, but I don't see it either. It's the wcwidth(3) answer
for a codepoint, and as Unicode continue to add POO WITH SUNGLASSES, so
the answers change with the version of one's system's database.
$ test/getcwidth --ctype | grep 1f576
1f576 1 address@hidden
So here it's `print', `graph', and `punct', with a width of 1. Norm's
gang have a width of -1 as they haven't the foggiest what it is.
http://unicode.org/cldr/utility/character.jsp?a=1f576&B1=Show says its
East Asian width is `Neutral', which is treated as `Narrow', so
getcwidth reporting 1 matches.
Nearby is http://unicode.org/cldr/utility/character.jsp?a=1f57a&B1=Show
that says it's `Wide', but here I don't know anything about that yet,
thankfully.
$ test/getcwidth --ctype | grep 1f57a
1f57a -1 ------------
One can poke about the local definitions.
$ test/getcwidth --ctype | awk '{print $2}' |
> sort -n | uniq -c
57249 -1
1723 0
29884 1
95464 2
$
$ test/getcwidth --ctype | awk '{print $3}' |
> LC_ALL=C sort | uniq -c
57183 ------------
14 -p--------sb
15563 address@hidden
10 -pg---dxN---
107528 -pga----N---
2167 -pga-l--N---
6 -pga-l-xN---
1772 -pgau---N---
6 -pgau--xN---
4 -pgaul--N---
60 c-----------
6 c---------s-
1 c---------sb
$
That says there are four runes that are both upper and lower!
$ printf '%b\n' $(test/getcwidth --ctype |
> awk '$3 ~ /ul/ {print "\\u" $1}')
Dž
Lj
Nj
Dz
$
And here's the first printable zero-width.
$ test/getcwidth --ctype | grep -m1 ' 0 .*p'
ad 0 address@hidden
U+00AD is soft hyphen. Unicode is said to be an ISO 8859-1 superset,
and U+AD was soft hyphen in that too, but visible, with a width of 1.
ISO used it at the end of the line to show a word had been broken, but
not by the author, allowing it to be stripped on re-formatting. Unicode
changed that. For them, it's a hint from the author to the renderer
that here's a potential point to break the word, thus, when rendered,
it's not visible and has zero width. Toc toc toc!
Terminals get this wrong. libvte-based terminals here think it has
width.
$ s="$(printf '\uad')"
$ scan -format "_%4(lit foo)_\n_%4(lit £)_\n_%4(lit $s)_" .
_foo _
_£ _
_ _ [Rune after first _ isn't a space.]
$
Dickey's venerable xterm(1) does better.
$ s="$(printf '\uad')"
$ scan -format "_%4(lit foo)_\n_%4(lit £)_\n_%4(lit $s)_" .
_foo _
_£ _
_ _ [All four are spaces.]
$
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, (continued)
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, Leonardo Taccari, 2017/08/05
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, Ralph Corderoy, 2017/08/06
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, Leonardo Taccari, 2017/08/06
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, Ralph Corderoy, 2017/08/07
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, Leonardo Taccari, 2017/08/07
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, David Levine, 2017/08/07
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, Ken Hornstein, 2017/08/15
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, Ralph Corderoy, 2017/08/16
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, Ken Hornstein, 2017/08/16
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, David Levine, 2017/08/07
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core,
Ralph Corderoy <=
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, Ken Hornstein, 2017/08/15
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, Ken Hornstein, 2017/08/15
- Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, Ralph Corderoy, 2017/08/16
Re: [Nmh-workers] nmh-1.7-RC1: scan with complex subjects dumps core, Ralph Corderoy, 2017/08/05