[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
UTF-8 multi-byte characters are not displayed properly on Windows consol
From: |
LIU Hao |
Subject: |
UTF-8 multi-byte characters are not displayed properly on Windows consoles |
Date: |
Thu, 12 Jan 2023 21:02:14 +0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 |
Hello Thomas E. Dickey,
Excuse me for the disruption. Thank you for your great work on ncurses. I'm writing to you because
my message didn't arrive at the GNU mailing list; neither did it bounce. Maybe it's subscribers-only?
There seems to be an issue about UTF-8 strings in UTF-8 consoles on Windows 10. My original message
follows. Hope it helps.
Have a nice day!
----- original message -----
Hello folks,
I'm mingw-w64 developer and MSYS2 contributor, and I maintain a GNU nano port to Windows [1]. First
of all, thank you for the great work!
Since Windows 10, the Windows console has gained UTF-8 support, which however has to be enabled
explicitly in system control panel. After UTF-8 support has been enabled and the UTF-8 code page has
been set up with the `chcp 65001` command, all standard C ctype functions can work on UTF-8 strings.
However, when GNU nano attempts to display a UTF-8 string, it is taken bytewise and becomes
gibberish. I have created this testcase, for example:
```
#include <ncursesw/ncurses.h>
int
main(void)
{
initscr();
addstr("»·"); // hex: C2 BB C2 B7
refresh();
getch();
}
```
The commented string literal contains two characters as four bytes. On Linux it is displayed
properly, but on a Windows UTF-8 console I get `»·`. How should I fix it?
[1] https://github.com/lhmouse/nano-win
--
Best regards,
LIU Hao
OpenPGP_signature
Description: OpenPGP digital signature