[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Implementing NSString getBytes

From: Richard Frith-Macdonald
Subject: Re: Implementing NSString getBytes
Date: Mon, 10 Jun 2013 09:37:51 +0100

On 10 Jun 2013, at 08:57, Luboš Doležel <address@hidden> wrote:

> On Mon, 10 Jun 2013 07:58:29 +0100, Richard Frith-Macdonald wrote:
>>> 1) it fails if the output buffer is too small - we don't want that in this 
>>> case
>> Well, that depends on what arguments you pass to it ... if you
>> provide it with a zone in which to allocate memory, it will allocate
>> memory to make a bigger output buffer if necesary.
> Not quite. The caller of getBytes supplies his own buffer.
> Should the buffer not be sufficient and GSFromUnicode() allocates own memory, 
> I'd then have trouble finding out how many bytes to copy to caller's buffer 
> *without* splitting an UTF-8 character in the middle (for example).

So there is no problem with memopry allocation ... if you want GSFromUnicode() 
to allocate more memory, you can tell it to do that, and if you don't want it 
to, you can tell it to do that too.

> The way I understand it, -getBytes should convert as many *complete* 
> characters as possible.

I expect so ... it's the way GSFromUnicode() already operates.

>>> 2) it doesn't tell you how many input characters were converted (which is 
>>> understandable because of 1)
>>> So I decided to use iconv() directly, which has a pitfall. I'd need to 
>>> access static members (EntrySupported) of Unicode.m to know the iconv 
>>> encoding's name etc. So I guess the iconv-related code (as attached) should 
>>> be moved into Unicode.m.
>> The problem with using iconv is that on most (all?) platforms, it
>> doesn't support all the character sets.  That's mostly why the
>> GSFromUnicode() function exists ... to handle the cases that iconv
>> can't handle directly.
> I see. I thought iconv supports all thinkable character sets - at least on 
> Linux.

No, it doesn't (and even if it did do that on one platform, it would of course 
still be unusable directly because we need to work on a variety of platforms).

>> It seems to me it would be quite simple to modify GSFromUnicode() to
>> do what you want ...  you'd need to change the source length argument
>> to be a pointer, so you could pass back the number of bytes actually
>> converted (which would mean a trivial change everywhere the function
>> is called of course).
> I will (as has been suggested) add a second GS*() function that has this 
> pointer and doesn't fail if the output buffer becomes full.

The GSFromUnicode() function already has an option to allow short conversions 
(ie not the full length of the input), stopping after the last full character 
it is able to convert.

> Then I'll rewrite GSFromUnicode() to call this function and fail if output 
> buffer is full and cannot be grown.

Given that GSFromUnicode() already does everything you want/need, apart from 
returning the number of characteers converted, it would seem obvious that the 
thing to do is simply tweak it to return that value (which of course it knows 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]