Re: [Tinycc-devel] Unicode letter escape

From:

Herman ten Brugge

Subject:

Date:

Sat, 13 Aug 2022 11:34:43 +0200

User-agent:

Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0

I make a change so L".." strings should work now on mob.

Herman

On 8/5/22 13:32, Samir Ribić wrote:

Tcc supports \u escape sequence inside L"" but I have no idea how to overcome this problem:

The code inside parse_escape_string function, in this part

case 'x':
case 'u':
case 'U':
p++;
n = 0;
for(;;) {
c = *p;
if (c >= 'a' && c <= 'f')
c = c - 'a' + 10;
else if (c >= 'A' && c <= 'F')
c = c - 'A' + 10;
else if (isnum(c))
c = c - '0';
else
break;
n = n * 16 + c;
p++;
}

does not limit the size of the hexadecimal number written after the \u escape code. Why is this a problem? If the text with an unicode letter is followed by letters a,b, c, d, e or f, it will be part of the code itself. For example L"Mogu\u0107i" will display the word "Mogući" as should be, because the code 0107 is c acute. However, the word L"Mogu\u0107e" will not display "Moguće" but "Moguၾ" because 107e is Myanmar Shan Fa

Section 6.4.3 of C99 standard ISO/IEC 9899:1999(E) -- Programming Languages -- C (uchile.cl) states that \unnnn escape sequence requires exactly four hexadecimal digits, so the code above needs to be changed.

[Prev in Thread]

Current Thread

[Next in Thread]

[Tinycc-devel] Unicode letter escape, Samir Ribić, 2022/08/05

Re: [Tinycc-devel] Unicode letter escape, Vincent Lefevre, 2022/08/05
Re: [Tinycc-devel] Unicode letter escape, Vincent Lefevre, 2022/08/05
Re: [Tinycc-devel] Unicode letter escape, Herman ten Brugge <=