[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
iconv bug?
From: |
Travis Shirk |
Subject: |
iconv bug? |
Date: |
Fri, 3 Jan 2003 16:08:47 -0700 (MST) |
Hello,
I'm seeing some odd behavior with iconv and converting from UTF-8 to
UCS-4. I wrote a simple test program that demonstrates the bug, but in
order to get the idea, here is the output:
Output from UTF-8 to UCS-4BE conversion:
wchar #0: a9000000
wchar #1: 60220000
Output from UTF-8 to UCS-4LE conversion:
wchar #0: a9
wchar #1: 2260
All the program does is create a UTF-8 byte array and convert, but as
the ouptput above shows the results are the exact opposite of what I
expect. In the first output I get little endian when I ask for big, and
in the second vice versa.
I ran this program on Solaris 8 with Sun Workshop and get the expected
results. That is, I get:
Output from UTF-8 to UCS-4BE conversion:
wchar #0: a9
wchar #1: 2260
Output from UTF-8 to UCS-4LE conversion:
wchar #0: a9000000
wchar #1: 60220000
Here is my program:
#include <stdio.h>
#include <wchar.h>
#include <iconv.h>
iconv_test(const char* fromCode, const char* toCode)
{
char utf8Str[64];
char* utf8Ptr = utf8Str;
wchar_t wideStr[64];
char* wideStrPtr = (char*)wideStr;
size_t i, inBytes, outBytes, numWideChars;
iconv_t conv;
/* utf8Str[0:2] == Unicode copyright character.
utf8Str[2:5] == Unicode not-equal character. */
utf8Str[0] = 0xC2;
utf8Str[1] = 0xA9;
utf8Str[2] = 0xE2;
utf8Str[3] = 0x89;
utf8Str[4] = 0xA0;
utf8Str[5] = 0x00;
conv = iconv_open(toCode, fromCode);
if (conv == (iconv_t)-1)
{
perror("iconv_open");
return 1;
}
memset(wideStr, 0, sizeof(wideStr));
inBytes = strlen(utf8Str);
outBytes = sizeof(wchar_t) * inBytes;
if (iconv(conv, &utf8Ptr, &inBytes, &wideStrPtr, &outBytes) ==
(size_t)-1)
{
perror("iconv");
return 2;
}
numWideChars = (wideStrPtr - (char*)wideStr) / sizeof(wchar_t);
printf("Output from %s to %s conversion:\n", fromCode, toCode);
for (i = 0; i < numWideChars; i++)
{
printf("wchar #%d: %x\n", i, wideStr[i]);
}
iconv_close(conv);
}
int main(int argc, char* argv[])
{
iconv_test("UTF-8", "UCS-4BE");
iconv_test("UTF-8", "UCS-4LE");
return 0;
}
--
| Travis Shirk travis at pobox dot com |
| http://www.travisshirk.net |
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- iconv bug?,
Travis Shirk <=