FAIL's in encoding unit test, bug in Unicode.m?

gnustep-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FAIL's in encoding unit test, bug in Unicode.m?

From:	Willem Rein Oudshoorn
Subject:	FAIL's in encoding unit test, bug in Unicode.m?
Date:	07 Dec 2002 01:56:26 +0100
User-agent:	Gnus/5.09 (Gnus v5.9.0) Emacs/21.1

Ok, I finally got gcc 3.2.1 compiled.  Now the unit tests fail
for the -[NSString dataUsingEncoding:] tests.

./gtest-base --tool NSString --file test04.scm

generates 5 failures.  They are all similar to the following:

FAIL: encode #<gstep-id 0x80d7c80 GSUnicodeInlineString string="ABC"> 
 #<gstep-id 0x80effa0 GSCString string="Unicode UTF-8"> to
 #<gstep-id 0x80d9758 NSDataMalloc string="<414243>">
FAIL: encode #<gstep-id 0x8118f60 GSUnicodeInlineString string="åäö****"> 
 #<gstep-id 0x80effa0 GSCString string="Unicode UTF-8"> to 
 #<gstep-id 0x8118e80 NSDataMalloc string="<c3a5c3a4 c3b6d7a9 d79cd795 d79d>">

I discovered that it is caused by the fact that the function

GSFromUnicode

returns NO on those encodings.  
To be more precise, look at the condensed version of the source:


----------------------------------------------
GSFromUnicode (....)
{
     ...

     default:
#ifdef HAVE_ICONV
     {
        ...
        while (inbytesleft > 0)
          {
             ...
             rval = iconv (cd, ...);
             if (rval != 0)
               {
                  if (rval == (size_t)-1)
                     {
                        ...
                     }
                  else if (strict == YES)
                      /*
                       * A positive return from iconv indicates some
                       * irreversible (ie lossy) conversions took place,
                       * so if we are doing strict conversions we must fail.
                       */
                      result = NO;
                      break;
   
-----------------------

So the comment suggest that if `iconv' returns a positive integer 
it will be a lossy conversion.

And this is exactly the place where the conversion fails.  
However, the conversion from the string "ABC" is not lossy.

Also reading the documentation of `iconv' it says:

----------------------
     If all input from the input buffer is successfully converted and
     stored in the output buffer the function returns the number of
     conversions performed.  In all other cases the return value is
     `(size_t) -1' and `errno' is set appropriately.  In this case the
     value pointed to by INBYTESLEFT is nonzero.
----------------------

So this suggest it will return the number of successfull converted
characters.  This is consistent with the values I see.

This suggests to me that the code in Unicode.m is wrong, or 
that there are two incompatible versions of iconv and I managed
to use the wrong one.

Also, if the intention is to check for lossy conversion, the relevant 
part of  the iconv documentation is:

---------------------
     Since the character sets selected in the `iconv_open' call can be
     almost arbitrary there can be situations where the input buffer
     contains valid characters which have no identical representation
     in the output character set.  The behavior in this situation is
     undefined.  The _current_ behavior of the GNU C library in this
     situation is to return with an error immediately.  This certainly
     is not the most desirable solution.  Therefore future versions
     will provide better ones but they are not yet finished.
--------------------

It might also be convenient to know that  I am using 

(g)libc-2.1.1

I might be tempted to try to fix this.  But I have never looked
at iconv and Unicode before and do not have a large amount of
free time.  

Oh, a final note, if I just remove the whole `else if (strict == YES)' 
clause the tests succeed.

Wim Oudshoorn.

[Prev in Thread]

Current Thread

[Next in Thread]

FAIL's in encoding unit test, bug in Unicode.m?, Willem Rein Oudshoorn <=
- Re: FAIL's in encoding unit test, bug in Unicode.m?, Richard Frith-Macdonald, 2002/12/08

Prev by Date: Re: [GSWHackers] [PATCH/RFC] gsweb GSWApplication.. preparation for GSW/WO-Names
Next by Date: Re: FAIL's in encoding unit test, bug in Unicode.m?
Previous by thread: [PATCH/RFC] gsweb GSWApplication.. preparation for GSW/WO-Names
Next by thread: Re: FAIL's in encoding unit test, bug in Unicode.m?
Index(es):
- Date
- Thread