tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Tinycc-devel] BUG: wide char in wide string literal handled incorrectly


From: 张博洋
Subject: [Tinycc-devel] BUG: wide char in wide string literal handled incorrectly
Date: Wed, 30 Aug 2017 15:30:55 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

Hello,

I found that when TCC processing wide string literal, it behaves like directly casting each char in original file to wchar_t and store them in wide string. This will work for ASCII chars. However, it might not work for real wide chars. For example: The Euro-sign (€, U+20AC) stored in UTF-8 is "E2 82 AC". In GCC, this char stored in wide string will be "000020AC". However, in TCC, this char is stored as 3 wide chars "000000E2 00000082 000000AC". I provided a patch, a test program and two screenshots that describe this problem, they are in attachments. I solve this problem by making assumptions that input charset is UTF-8. Although it's not a perfect solution, it's still better than "directly casting char to wchar_t". I'm wondering if that is appropriate, so please review the code carefully.

Thanks
Zhang Boyang

Attachment: after-patch.png
Description: PNG image

Attachment: before-patch.png
Description: PNG image

Attachment: test.c
Description: Text Data

Attachment: utf8.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]