[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] UTF-8 \(la and \(ra glyphs
From: |
Werner LEMBERG |
Subject: |
Re: [Groff] UTF-8 \(la and \(ra glyphs |
Date: |
Mon, 24 Feb 2003 15:36:11 +0100 (CET) |
> font/devutf8/R.proto gives the width of the \(la and \(ra glyphs as
> 24, which is the standard unit width on this device.
Correct.
> However, localedata/charmaps/UTF-8 in glibc lists U+2329 and U+232A
> as being double-width characters, following
> http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt, so
> wcwidth() returns 2 for each of them;
> http://www.unicode.org/Public/UNIDATA/NamesList.txt notes that they
> are used as CJK punctuation.
But not only. At the same time, you can find the following sentence
in the annotation of those two characters (file NamesList.txt):
discouraged for mathematical use because of canonical equivalence to
CJK punctuation
> I'd submit a patch to correct their width to 48 except that I'm not
> sure exactly what I should be patching - is there a script somewhere
> which generates these fonts?
No. They have been supplied by Bruno Haible with constant editing by
me.
> I do wonder why \(la and \(ra are used in www.tmac to delimit e-mail
> addresses, since they won't copy-and-paste correctly to < and >. Or
> perhaps the UTF-8 mapping for these two characters ought to be
> changed to U+003C and U+003E respectively.
`<' and `>' look bad in printed output; actually, those two characters
are not delimiters -- I really think that \[la] and \[ra] are the
right symbols.
And no, I won't map \[la] and \[ra] to `<' and `>' for UTF-8. If you
want to do that, please overwrite it locally in the configuration file
of the particular macro package.
A different question is whether U+2329 and U+232A are the right code
points, and your email convinced me that they are not. Due to the
canonical equivalence to U+3008 and U+3009 (which also affects the
width of the character) I will change the code points to the new
values U+27E8 MATHEMATICAL LEFT ANGLE BRACKET and U+27E9 MATHEMATICAL
RIGHT ANGLE BRACKET. I foresee difficulties with that mapping since
U+27E8 and U+27E9 are very recent characters added in Unicode 3.2
which probably don't exist in many fonts, but I believe it is better
to avoid such compromises in the `official' groff version.
Thanks for pointing this out.
Werner
PS: BTW, in HTML 4, the entities ⟨ and ⟩ officially map to
U+2329 and U+232A, so I won't change it for grohtml. Probably the
mapping will be revised for the next HTML version...