gm2
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Inquiring about the status of the proposed UNICODE library


From: Alice Osako
Subject: Re: Inquiring about the status of the proposed UNICODE library
Date: Thu, 7 Mar 2024 04:04:14 -0500
User-agent: Mozilla Thunderbird

Benjamin Kowarsch:
I cannot tell you anything about libraries, but I can put the question of Unicode support in ISO Modula-2 into perspective:
 <interesting historical overview snipped>
In the hope this puts things into perspective

Yes, that does. It still leaves me with a bit of a quandary, however, due to the lack of a UNICHAR type and API at present.

I've gone over the Lilley code from 2010, and aside from being decidedly incomplete, it has what, for my purposes, is a serious design problem: it assumes UTF-16 as the base type (defining its 'UChar' type as a derived type of from SHORTCARD). This would have been an issue even at the time of its design, as outside of the MS Windows system internals, UTF-8 had already been more or less the default encoding for most systems and languages (among those which supported UNICODE at all) for several years.

Even if this were not the case, supporting only a single UNICODE encoding is potentially problematic, even if it is pretty common.

Indeed, the fact that JSON is defined with UTF-8 as the sole supported encoding is where I got to this point in the first place. But that's another issue altogether.

So in the absence of any other options, it looks to me as if I will have to bite the bullet and either write a non-conforming ASCII-only implementation of JSON (which, to be fair, seems to be a pretty common implementation flaw in a lot of JSON parsers), or else dive head-first into designing a user-level UNICODE library written in Modula-2 (most likely one which is specific to GM2, though I would rather it be as portable as feasible).

Just contemplating how I would implement UTF-8's variable-size encoding is intimidating enough, never mind grasping the intricacies of UNICODE code points across multiple encodings.

Can anyone else add anything to this? I would welcome an alternative if one is available.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]