[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Tinycc-devel] Unicode letter escape
From: |
Vincent Lefevre |
Subject: |
Re: [Tinycc-devel] Unicode letter escape |
Date: |
Fri, 5 Aug 2022 14:03:53 +0200 |
User-agent: |
Mutt/2.2.6+34 (76e93dd3) vl-149028 (2022-07-31) |
On 2022-08-05 13:32:04 +0200, Samir Ribić via Tinycc-devel wrote:
> Tcc supports \u escape sequence inside L"" but I have no idea how to
> overcome this problem:
> The code inside parse_escape_string function, in this part
>
> case 'x':
> case 'u':
> case 'U':
> p++;
> n = 0;
> for(;;) {
> c = *p;
> if (c >= 'a' && c <= 'f')
> c = c - 'a' + 10;
> else if (c >= 'A' && c <= 'F')
> c = c - 'A' + 10;
> else if (isnum(c))
> c = c - '0';
> else
> break;
> n = n * 16 + c;
> p++;
> }
>
> does not limit the size of the hexadecimal number written after the \u
> escape code. Why is this a problem? If the text with an unicode letter is
> followed by letters a,b, c, d, e or f, it will be part of the code itself.
> For example L"Mogu\u0107i" will display the word "Mogući" as should be,
> because the code 0107 is c acute. However, the word L"Mogu\u0107e" will
> not display "Moguće" but "Moguၾ" because 107e is Myanmar Shan Fa
>
> Section 6.4.3 of C99 standard ISO/IEC 9899:1999(E) -- Programming
> Languages -- C (uchile.cl)
> <https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf> states
> that \unnnn escape sequence requires exactly four hexadecimal digits, so
> the code above needs to be changed.
And exactly 8 hexadecimal digits for \U.
--
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)