Re: Wide and UTF-8 international characters

bug-ncurses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wide and UTF-8 international characters

From:	D. Stimits
Subject:	Re: Wide and UTF-8 international characters
Date:	Sat, 17 May 2003 16:25:21 -0600
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2b) Gecko/20021018

I'm still trying to think ahead on my project, so I'm going to ask basedon what I've read, but not tested (at least not with ncurses).

...


>If I am using just a console or or xterm, without ncurses, I can output
>the full 8 bit characters as described in html 8-bit entities, echoed
>directly to a console (not with ncurses or any lib), such as "©",
>and get the copyright symbol that is like a 'c' inside of a circle (it
>happens that to echo this I echo an uninterpreted 169 decimal, typecast
>to char). So current terminals, whether console or X11, use the full 8


generally true.  But the 8th bit used for standout in BSD curses was
stripped off and used as a flag to tell that implementation whether
to use standout mode to highlight characters.


>bits to create their display. If the eighth bit is being used by curses,
>then the top 128 characters are lost to standout mode ability. On the
>other hand, if ncurses uses a separate byte (a 16 bits) to store


more than 8 bits, actually.

So it sounds like the 8th bit is no longer used as a flag...is thatcorrect? But also that 1 or more bytes are then added with eachcharacter cell to provide attribute data...is that correct?

>characteristics, while leaving the full 8 bits to display output, then
>ncurses can display the full 255 character entity set (html entity set)
>simply by sending the character straight to the terminal. I'm not
>positive, but this should include the full UTF-8 set, which is only
>single-byte. Is ncurses storing attribute in a separate byte already? Or
the problem with that, is that it doesn't mix well with treating thescreenas an array of characters. You _could_ store each row as a multibytestring(with some pain achieved at the right margin), but it would requirecountingor some index added to point to a character which starts at a givencolumn.
Instead, the common approach stores multiple characters for each array
position - some storage is wasted, but it's accessed more rapidly.

I assume that the actual character then is always converted to a widecharacter, even if it is just common text not requiring a wide character(because it is easier to deal with uniform wide characters thanvarying-width multibyte representations with escape sequences to markcharacter set changes). How many bytes does the current ncurses use tostore non-attribute character data? I would guess two 8-bit bytesinternally per cell.


>is it the way of the old book description, with 7 bits for character,
>and the last bit for standout mode flagging? If a separate byte is used
>already, then it would seem that multibyte characters already have the
>"infrastructure" to be plugged into ncurses. [FYI, it would be rather
>useful to see an entity substitution ability, like "©" in html]
>
>Pardon my curiosity, lately I've been looking at some non-7-bit ascii
>clients, but the clients support only 8 bit, not multibyte characters. I
>created a lightweight XML style data tree storage mechanism that uses
>XML/html entities to represent characters that cannot be easily entered
>via a keyboard, and it turned out to be far more flexible/useful than I
>thought at first. I remember seeing some of the development ncurses
>branch as partial or initial support for the wide characters, and I

that was up til mid-2001 - I didn't quite know where to begin atrewriting,

but one of the contributors got it moving.  ncurses 5.3 was good enough to
use - the current code probably has isolated bugs, but I don't see any
that are related to wide-characters.  Not all functions are tested - so
I've been reviewing, adding test-programs for places that are noticeably
not covered.

Currently on Linux, I could display a copyright symbol ('c' inside of acircle) by outputting 169 decimal cast as character (8 bits) to theterminal. I'm looking at the man page for echochar, and it appears thatncurses came up with its own version of something similar to html/xmlcharacter entities, but the ncurses version is not as complete ashtml/xml entities. If I were to use a printw function with a %c format,feeding it 169 decimal (or anything from 128 through 255), will ncursesever represent the output appearance differently than had I fed thatdecimal number (cast as 8 bit character) directly to a standard linuxconsole or xterm?


D. Stimits, stimits AT attbi DOT com

[Prev in Thread]

Current Thread

[Next in Thread]

Wide and UTF-8 international characters, John Smith, 2003/05/09
- Re: Wide and UTF-8 international characters, Thomas Dickey, 2003/05/09
  - Re: Wide and UTF-8 international characters, D. Stimits, 2003/05/16
    - Re: Wide and UTF-8 international characters, Thomas Dickey, 2003/05/16
    - Re: Wide and UTF-8 international characters, D. Stimits <=
    - Re: Wide and UTF-8 international characters, Thomas Dickey, 2003/05/17

Prev by Date: Re: Wide and UTF-8 international characters
Next by Date: Re: Wide and UTF-8 international characters
Previous by thread: Re: Wide and UTF-8 international characters
Next by thread: Re: Wide and UTF-8 international characters
Index(es):
- Date
- Thread