bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] GSM-7 support


From: Bruno Haible
Subject: Re: [bug-gnu-libiconv] GSM-7 support
Date: Mon, 10 Aug 2009 09:05:13 +0200
User-agent: KMail/1.9.9

Hi,

Jeroen de Borst wrote:
> GSM-7 is a character set used by SMS messages. It is defined in 
> http://www.3gpp.org/ftp/Specs/html-info/23038.htm.
> 
> I have added support for it in libiconv, though I haven't spent much time 
> looking at the coding standards.
> 
> I described what I did in my blog: 
> http://mobiletidings.com/2009/07/06/gsm-7-encoding-gnu-libiconv/.
> 
> Could / should this be added to libiconv?

You are of course welcome to use libiconv's infrastructure for this purpose.
But I think that iconv() is not well suited for this purpose, based on the
standards that you have been referring to (thanks for the URL, by the way!).

- According to [1] and [2], there is not one encoding to handle, but up to 513
  of them! Namely, if no "national language single shift" identifier element
  and no "national language locking shift" identifier element is present,
  then only the default GSM 7-bit encoding is used. But with one of these
  information elements present in the TP User Data Header - I understand this
  is not part of the message - different variants are used.

- There is also a mode in which the encoding is directly UCS-2, and this mode
  is also indicated through some bits in the header. Conveniently, this
  should be done through the same API. But iconv() cannot be used for this,
  because you cannot pass the header information to iconv_open().

- iconv() is designed for conversion of streams of data, that is, of more
  data than fits into a buffer. Here we are talking about messages that
  are entirely in memory of a processing machine before being processed.

In summary, I think a better API for this task is more like this:

  /* Converts input[0..input_size-1] to unicode_output[0..n-1] and
     stores n in *output_size_p. */
  gsm_decode (unsigned int *unicode_output, size_t *output_size_p,
              const unsigned char *input, size_t input_size,
              const struct ... *header);

Bruno

[1] http;//www.3gpp.org/ftp/Specs/archive/23_series/23.038/23038-820.zip
[2] http;//www.3gpp.org/ftp/Specs/archive/23_series/23.040/23040-900.zip




reply via email to

[Prev in Thread] Current Thread [Next in Thread]