bug-glibc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

iconv bug?


From: Travis Shirk
Subject: iconv bug?
Date: Fri, 3 Jan 2003 16:08:47 -0700 (MST)

Hello,

I'm seeing some odd behavior with iconv and converting from UTF-8 to
UCS-4.  I wrote a simple test program that demonstrates the bug, but in
order to get the idea, here is the output:

Output from UTF-8 to UCS-4BE conversion:
wchar #0: a9000000
wchar #1: 60220000
Output from UTF-8 to UCS-4LE conversion:
wchar #0: a9
wchar #1: 2260

All the program does is create a UTF-8 byte array and convert, but as
the ouptput above shows the results are the exact opposite of what I
expect.  In the first output I get little endian when I ask for big, and
in the second vice versa.

I ran this program on Solaris 8 with Sun Workshop and get the expected
results.  That is, I get:

Output from UTF-8 to UCS-4BE conversion:
wchar #0: a9
wchar #1: 2260
Output from UTF-8 to UCS-4LE conversion:
wchar #0: a9000000
wchar #1: 60220000

Here is my program:

#include <stdio.h>
#include <wchar.h>
#include <iconv.h>

iconv_test(const char* fromCode, const char* toCode)
{
    char     utf8Str[64];
    char*    utf8Ptr = utf8Str;
    wchar_t  wideStr[64];
    char*    wideStrPtr = (char*)wideStr;
    size_t   i, inBytes, outBytes, numWideChars;
    iconv_t  conv;

    /* utf8Str[0:2] == Unicode copyright character.
       utf8Str[2:5] == Unicode not-equal character. */
    utf8Str[0] = 0xC2;
    utf8Str[1] = 0xA9;
    utf8Str[2] = 0xE2;
    utf8Str[3] = 0x89;
    utf8Str[4] = 0xA0;
    utf8Str[5] = 0x00;

    conv = iconv_open(toCode, fromCode);
    if (conv == (iconv_t)-1)
    {
        perror("iconv_open");
        return 1;
    }

    memset(wideStr, 0, sizeof(wideStr));

    inBytes = strlen(utf8Str);
    outBytes = sizeof(wchar_t) * inBytes;

    if (iconv(conv, &utf8Ptr, &inBytes, &wideStrPtr, &outBytes) ==
(size_t)-1)
    {
        perror("iconv");
        return 2;
    }

    numWideChars = (wideStrPtr - (char*)wideStr) / sizeof(wchar_t);

    printf("Output from %s to %s conversion:\n", fromCode, toCode);
    for (i = 0; i < numWideChars; i++)
    {
        printf("wchar #%d: %x\n", i, wideStr[i]);
    }

    iconv_close(conv);
}

int main(int argc, char* argv[])
{
    iconv_test("UTF-8", "UCS-4BE");
    iconv_test("UTF-8", "UCS-4LE");
    return 0;
}

-- 
| Travis Shirk                                  travis at pobox dot com |
|                                            http://www.travisshirk.net |




reply via email to

[Prev in Thread] Current Thread [Next in Thread]