[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: extended ASCII characters do not show up
From: |
amores perros |
Subject: |
Re: extended ASCII characters do not show up |
Date: |
Sat, 01 Oct 2005 18:16:31 +0000 |
From: Thomas Dickey <address@hidden>
Subject: Re: extended ASCII characters do not show up
Date: Sat, 1 Oct 2005 12:34:19 -0400 (EDT)
<snip>
#2)
I don't understand how line drawing characters (such as AC_VLINE, I
think) work with UTF-8? That is, I don't know what they could
expand to that would work, unless they (macros I assume) expand
to characters between 0 and 0x20 which are not otherwise used.
I've looked in the ncurses faq for UTF-8, but if the answer is there
I overlooked it :(
yes (it did occur to me that I should add this to my faq - on my to-do
list).
The ACS_xxx symbols are a character (which corresponds to the vt100
line-drawing), with A_ALTCHARSET added. ncurses keeps track of the
A_ALTCHARSET, and when it is time to write the data to the screen, checks
to see if the encoding is UTF-8. If so, it checks some special cases (such
as Linux console) to see if it should not try to use the terminfo string to
transform its internal character to the terminal's equivalent.
Its a little tough to follow looking at ncurses sources, as I think
these .in files will expand after some autotools, but
A_ALTCHARSET probably expands to NCURSES_BITS(@cf_cv_1UL@,14),
and NCURSES_BITS probably expands via
#define NCURSES_BITS(mask,shift) ((mask) << ((shift) + NCURSES_ATTR_SHIFT))
so I think A_ALTCHARSET sets at least 14 bits up (and WA_ALTCHARSET
is the same thing as A_ALTCHARSET).
So, my assumption that, eg, ACS_VLINE is a char, was misfounded --
ACS_VLINE is an integer of 14+bits, and A_ALTCHARSET is probably some
high bit set to flag this integer as not being a character in a UTF-8
encoding.
So, now I understand how you can overload a "char" with information
outside of the UTF-8 range -- its not a char, but an integer -- much like
EOF, which uses a value outside of char range to mean the end of file
flag, I think.
At least, I think I understand it now.
For Linux console in UTF-8 mode, the line-drawing characters are all
represented as 3 bytes in UTF-8 encoding. That isn't compatible with the
terminfo acsc string (which always does 1 byte mapped to another 1 byte).
The table that ncurses uses for the UTF-8 line-drawing is in
ncurses/widechar/lib_wacs.c
That lists the Unicode values such as
{ 'q', { '-', 0x2500 }}, /* horizontal line */
I don't know what a "terminfo acsc string" is, but I think I'm
content with the level of my limited understanding now, and
thanks for pointing out where the table is, to see what unicode
characters are being used for drawing.
Thank you.
Cordially,
Perry
Re: extended ASCII characters do not show up, Rajat Das, 2005/10/03