[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-libunistring] Hangul Jamo vowels and trailing consonants should
From: |
Bruno Haible |
Subject: |
Re: [bug-libunistring] Hangul Jamo vowels and trailing consonants should probably be 0 width |
Date: |
Tue, 28 Dec 2021 11:36:10 +0100 |
Hello Luis,
> I've been looking at widths reported for Hangul Jamo in wcwidth
> implementations.
Thanks for bringing up this issue. I wasn't aware of it.
> glibc gave width 0 to conjoining jungseong and jongseong at:
https://sourceware.org/bugzilla/show_bug.cgi?id=21750
https://sourceware.org/bugzilla/show_bug.cgi?id=22074
Ouch. As Egmont Koblinger wrote in the first of these glibc tickets, every
change to the commonly accepted wcwidth has the potential to cause trouble.
> In glibc and MirBSD xterm, U+1160..U+11FF and U+D7B0..U+D7FF have 0 width.
I agree that U+D7B0..U+D7FF (Hangul Jamo Extended-B) should be treated like
U+1160..U+11FF (Hangul Jamo medial and final), per Unicode standard, chapter 18
https://www.unicode.org/versions/Unicode14.0.0/ch18.pdf .
However, I don't think what people have been looking at is the right spot.
1) People (esp. Thorsten Glaser) have been arguing with the behaviour of xterm.
But xterm is rarely used nowadays. I have evaluated the popularity of terminal
emulators in August 2019, and here are the results:
* measured through Debian popularity contest:
https://qa.debian.org/popcon.php?package=konsole 11%
https://qa.debian.org/popcon.php?package=emacs 7%
https://qa.debian.org/popcon.php?package=lxterminal 6%
https://qa.debian.org/popcon.php?package=guake 1.3%
https://qa.debian.org/popcon.php?package=yakuake 1.1%
https://qa.debian.org/popcon.php?package=rxvt 0.9%
https://qa.debian.org/popcon.php?package=termit 0.7%
https://qa.debian.org/popcon.php?package=lilyterm 0.1%
* https://opensource.com/life/17/10/top-terminal-emulators
1. gnome-terminal
2. terminator
3. konsole
4. xterm
5. guake
6. yakuake
7. tilda
The conclusion is that
- GNOME vte based terminal emulators are probably 50% today,
- konsole comes second,
- xterm is not important (because who still wants to use a program
with Athena widgets in an environment based on Gtk and/or Qt widgets?)
2) People argue about the use of these Hangul Jamo characters when
they form a complete Hangul syllable, and that in this case the
total width should be 2, and therefore 2 = 2 + medial + final the
medial and final parts should have width 0.
But in this case people would be using a precomposed Hangul syllable.
What I am more concerned about: When you look at the code charts
https://www.unicode.org/charts/PDF/U1100.pdf
https://www.unicode.org/charts/PDF/UD7B0.pdf
you see that there are glyphs.
- In which circumstances are these characters used individually?
Maybe in a text book for Korean children?
- How are they supposed to be rendered in these situations? Surely
as glyphs of width 2, no?
In the end, it comes down to: What is the more frequent context for
these characters?
Bruno