Re: [Tinycc-devel] Unicode letter escape

tinycc-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] Unicode letter escape

From:	Vincent Lefevre
Subject:	Re: [Tinycc-devel] Unicode letter escape
Date:	Fri, 5 Aug 2022 14:03:53 +0200
User-agent:	Mutt/2.2.6+34 (76e93dd3) vl-149028 (2022-07-31)

On 2022-08-05 13:32:04 +0200, Samir Ribić via Tinycc-devel wrote:
> Tcc supports \u escape sequence inside L"" but I have no idea how to
> overcome this problem:
> The code inside parse_escape_string function, in this part
> 
>            case 'x':
>             case 'u':
>             case 'U':
>                 p++;
>                 n = 0;
>                 for(;;) {
>                     c = *p;
>                     if (c >= 'a' && c <= 'f')
>                         c = c - 'a' + 10;
>                     else if (c >= 'A' && c <= 'F')
>                         c = c - 'A' + 10;
>                     else if (isnum(c))
>                         c = c - '0';
>                     else
>                         break;
>                     n = n * 16 + c;
>                     p++;
>                 }
> 
> does not limit the size of the hexadecimal number written after the \u
> escape code. Why is this a problem? If the text with an unicode letter is
> followed by letters a,b, c, d, e or f, it will be part of the code itself.
> For example L"Mogu\u0107i" will display the word "Mogući" as should be,
> because the code 0107 is c acute.  However, the word L"Mogu\u0107e" will
> not display "Moguće" but "Moguၾ" because 107e is  Myanmar Shan Fa
> 
> Section 6.4.3 of C99 standard  ISO/IEC 9899:1999(E) -- Programming
> Languages -- C (uchile.cl)
> <https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf> states
> that \unnnn escape sequence requires exactly four hexadecimal digits, so
> the code above needs  to be changed.

And exactly 8 hexadecimal digits for \U.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

[Prev in Thread]

Current Thread

[Next in Thread]

[Tinycc-devel] Unicode letter escape, Samir Ribić, 2022/08/05
- Re: [Tinycc-devel] Unicode letter escape, Vincent Lefevre <=
- Re: [Tinycc-devel] Unicode letter escape, Vincent Lefevre, 2022/08/05
- Re: [Tinycc-devel] Unicode letter escape, Herman ten Brugge, 2022/08/13

Prev by Date: [Tinycc-devel] reloc problems on arm m1
Next by Date: Re: [Tinycc-devel] Unicode letter escape
Previous by thread: [Tinycc-devel] Unicode letter escape
Next by thread: Re: [Tinycc-devel] Unicode letter escape
Index(es):
- Date
- Thread