[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#64420: string-width of … is 2 in CJK environments
From: |
Yuan Fu |
Subject: |
bug#64420: string-width of … is 2 in CJK environments |
Date: |
Wed, 12 Jul 2023 14:11:14 -0700 |
> On Jul 11, 2023, at 6:17 PM, Dmitry Gutov <dmitry@gutov.dev> wrote:
>
> On 11/07/2023 21:45, Eli Zaretskii wrote:
>
>>>> Once again, the assumption behind this "feature" of the CJK
>>>> language-environments is that whoever uses those environments has the
>>>> terminal emulators configured to use fonts where "…" and its ilk have
>>>> double size. Of course, if you just switch language-environment on a
>>>> system that is otherwise configured for non-CJK locale, the terminal
>>>> emulator fonts will not magically change, and you get what you see.
>>>
>>> Does "…" actually have double width in some of their fonts?
>> That's the assumption, yes. (And not only this one character, you can
>> see which characters we assume have the same width in the function I
>> pointed out earlier in this thread, which we run when the
>> language-environment is switched to something CJK.) It was definitely
>> correct at some point in the past, but the big question is whether it
>> is still correct. I don't know who can tell us that nowadays.
>
> Whole ranges of characters, I see.
Here’s what I know: In a CJK “context”, “…” is supposed to be one ideograph
wide (like all CJK punctuation), ie, width=2.
However, it’s not as simple as “they used the wrong font”, because both Latin
and CJK use the same Unicode code point for “…”, but expect different glyphs.
In publication, this is solved by manually marking the text with style or font,
so the software uses the desired glyph. Terminals and editors don’t have this
luxury.
BTW it’s not just ellipses, CJK and Latin shares the same code points for
quotes, em dash and middle dot while expecting different glyphs for them.
Since most terminal and editor (especially terminal) quires ASCII/Latin font
before falling back to CJK fonts, I expect most terminal and editor to show the
Latin glyph for “…” (width=1) most of the time.
So practically, it would be correct most of the time if we assume the following
code points have a width of 1, regardless of locale:
– HORIZONTAL ELLIPSIS …
– LEFT/RIGHT DOUBLE QUOTATION MARK “”
– LEFT/RIGHT SINGLE QUOTATION MARK ‘’
– EM DASH —
– MIDDLE DOT ·
But obviously if someone configures their terminal or editor to use CJK font
first, these characters MIGHT have width = 2. I said MIGHT because there are
plenty CJK fonts that uses the 1-width Latin glyph for these characters by
default.
It might be helpful to have a wrapper string-width that considers heuristics
like this, while string-width goes strictly by Unicode and locale.
Source:
https://www.w3.org/TR/clreq/#table_of_non-bracket_indication_punctuation_marks
Yuan
- bug#64420: string-width of … is 2 in CJK environments, (continued)
- bug#64420: string-width of … is 2 in CJK environments, Dmitry Gutov, 2023/07/06
- bug#64420: string-width of … is 2 in CJK environments, Eli Zaretskii, 2023/07/07
- bug#64420: string-width of … is 2 in CJK environments, Dmitry Gutov, 2023/07/10
- bug#64420: string-width of … is 2 in CJK environments, Eli Zaretskii, 2023/07/11
- bug#64420: string-width of … is 2 in CJK environments, Dmitry Gutov, 2023/07/10
- bug#64420: string-width of … is 2 in CJK environments, Eli Zaretskii, 2023/07/11
- bug#64420: string-width of … is 2 in CJK environments, Dmitry Gutov, 2023/07/11
- bug#64420: string-width of … is 2 in CJK environments, Eli Zaretskii, 2023/07/11
- bug#64420: string-width of … is 2 in CJK environments, Dmitry Gutov, 2023/07/11
- bug#64420: string-width of … is 2 in CJK environments, Dmitry Gutov, 2023/07/12
- bug#64420: string-width of … is 2 in CJK environments,
Yuan Fu <=
- bug#64420: string-width of … is 2 in CJK environments, Eli Zaretskii, 2023/07/13
- bug#64420: string-width of … is 2 in CJK environments, Dmitry Gutov, 2023/07/26
- bug#64420: string-width of … is 2 in CJK environments, SUNG TAE KIM, 2023/07/14
- bug#64420: string-width of … is 2 in CJK environments, SUNG TAE KIM, 2023/07/14
- bug#64420: string-width of … is 2 in CJK environments, SUNG TAE KIM, 2023/07/16