bug-glibc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Nonstandard implementation problems in iconvdata


From: Ben Hochstedler
Subject: Nonstandard implementation problems in iconvdata
Date: Mon, 20 Nov 2000 08:35:08 -0600

The implementation of iconv has two major differences from what seems
to be the standard:

1. When converting from encoding A to encoding B, where there are
   characters in A not in B, it stops if one of those nonconvertable
   characters is encountered and returns the error for when there
   are characters input that aren't in A.

   Here is the correct behavior defined by the iconv.3 manpage on
   Solaris:

     If iconv() encounters a character in the input  buffer  that
     is  legal,  but  for  which  an identical character does not
     exist  in  the  target  code  set,   iconv()   performs   an
     implementation-defined conversion on this character.

2. The return value from a successful iconv() is always 0.  Once
   problem #1 is fixed, iconv() should return the number of
   characters that were in A but not in B.

   Here's what the Solaris manpage says:

RETURN VALUES
     The iconv() function ...
     returns the number of non-identical  conversions  performed.


The fix is to update every to loop body in iconvdata so that
instead of setting the result to GCONV_ILLEGAL_INPUT and breaking
out, to set the character in encoding B to some stub character (we
get to pick because it's implementation defined), and then to
increment the converted variable.  Following is the changes to make
to iso8859-1.c.  I could make the changes to all of the modules and
give you a patch.

--- iso8859-1.c@@/main/v2b/0    Tue Nov 14 14:19:59 2000
+++ iso8859-1.c Thu Nov 16 14:15:11 2000
@@ -47,9 +49,9 @@
     uint32_t ch = *((uint32_t *) inptr);                                     \
     if (ch > 0xff)                                                           \
       {                                                                      \
-       /* We have an illegal character.  */                                  \
-       result = GCONV_ILLEGAL_INPUT;                                         \
-       break;                                                                \
+       /* We have an unsupported character.  */                              \
+       ch = GCONV_STUB_CHAR;                                                 \
+       *converted += 1;                                                      \
       }                                                                      \
     *outptr++ = (unsigned char) ch;                                          \
     inptr += 4;                                                              \


GCONV_ILLEGAL_INPUT should only be used in from loops, not to
loops because in a from loop, the character would be illegal
in the input encoding, while in a to loop, it was legal in the
input encoding and simply not convertable to the output
encoding.

The affect making this change would have on the iconv program,
for instance, is that when converting a document that has some
characters in it that aren't in the resulting character set, it
will actually convert it with those nonconvertable characters
stubbed.  Right now, iconv aborts and produces an error which
is not standard and really annoying.

-Ben

-- 
Ben Hochstedler         GE Medical Systems Information Technologies
address@hidden     http://www.ge.com/medical/marquette/
Phone: 414-362-3317      Fax: 414-362-3389      Dial-comm: 401-3317



reply via email to

[Prev in Thread] Current Thread [Next in Thread]