[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Nonstandard implementation problems in iconvdata
From: |
Ben Hochstedler |
Subject: |
Nonstandard implementation problems in iconvdata |
Date: |
Mon, 20 Nov 2000 08:35:08 -0600 |
The implementation of iconv has two major differences from what seems
to be the standard:
1. When converting from encoding A to encoding B, where there are
characters in A not in B, it stops if one of those nonconvertable
characters is encountered and returns the error for when there
are characters input that aren't in A.
Here is the correct behavior defined by the iconv.3 manpage on
Solaris:
If iconv() encounters a character in the input buffer that
is legal, but for which an identical character does not
exist in the target code set, iconv() performs an
implementation-defined conversion on this character.
2. The return value from a successful iconv() is always 0. Once
problem #1 is fixed, iconv() should return the number of
characters that were in A but not in B.
Here's what the Solaris manpage says:
RETURN VALUES
The iconv() function ...
returns the number of non-identical conversions performed.
The fix is to update every to loop body in iconvdata so that
instead of setting the result to GCONV_ILLEGAL_INPUT and breaking
out, to set the character in encoding B to some stub character (we
get to pick because it's implementation defined), and then to
increment the converted variable. Following is the changes to make
to iso8859-1.c. I could make the changes to all of the modules and
give you a patch.
--- iso8859-1.c@@/main/v2b/0 Tue Nov 14 14:19:59 2000
+++ iso8859-1.c Thu Nov 16 14:15:11 2000
@@ -47,9 +49,9 @@
uint32_t ch = *((uint32_t *) inptr); \
if (ch > 0xff) \
{ \
- /* We have an illegal character. */ \
- result = GCONV_ILLEGAL_INPUT; \
- break; \
+ /* We have an unsupported character. */ \
+ ch = GCONV_STUB_CHAR; \
+ *converted += 1; \
} \
*outptr++ = (unsigned char) ch; \
inptr += 4; \
GCONV_ILLEGAL_INPUT should only be used in from loops, not to
loops because in a from loop, the character would be illegal
in the input encoding, while in a to loop, it was legal in the
input encoding and simply not convertable to the output
encoding.
The affect making this change would have on the iconv program,
for instance, is that when converting a document that has some
characters in it that aren't in the resulting character set, it
will actually convert it with those nonconvertable characters
stubbed. Right now, iconv aborts and produces an error which
is not standard and really annoying.
-Ben
--
Ben Hochstedler GE Medical Systems Information Technologies
address@hidden http://www.ge.com/medical/marquette/
Phone: 414-362-3317 Fax: 414-362-3389 Dial-comm: 401-3317
- Nonstandard implementation problems in iconvdata,
Ben Hochstedler <=