gnustep-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New ABI NSConstantString


From: Fred Kiefer
Subject: Re: New ABI NSConstantString
Date: Sun, 8 Apr 2018 17:19:59 +0200


> Am 08.04.2018 um 16:11 schrieb David Chisnall <address@hidden>:
> 
> On 8 Apr 2018, at 12:41, David Chisnall <address@hidden> wrote:
>> 
>> On 8 Apr 2018, at 10:55, Richard Frith-Macdonald <address@hidden> wrote:
>>> 
>>> 
>>> 
>>>> On 6 Apr 2018, at 11:00, David Chisnall <address@hidden> wrote:
>>>> 
>>>> It would probably help catch more bugs if we made use of NSString’s 
>>>> class-cluster nature more in -base.  I have just fixed a bug in GSString 
>>>> where we were checking one object matched a particular class before 
>>>> dereferencing the _flags ivar of the other.  I caught this because the 
>>>> other was a GSTinyString, which is almost never a valid pointer.
>>> 
>>> Possibly, but performance *is* an issue here.  The NSString code was 
>>> rewritten some years ago (moving away from them use of class cluster 
>>> features) as a result of extensive profiling of real-world applications 
>>> which were running too slow, precisely because NSString methods are very 
>>> heavily used in real apps. At the time somethjing like 20% of the CPU was 
>>> wasted in method dispatch overheads (the -characterAtIndex: method is one 
>>> of the cluster primitives and a major culprit) but there were also 
>>> performance issues due to buffer allocation and copying of internal 
>>> representations.  The changes made a substantial improvement in general 
>>> performance as well as causing multipler orders of magnitude improvement in 
>>> a few pathological cases.
>> 
>> I agree that we should be improving performance for critical code, but 
>> unfortunately it appears that we have done so at the expense of correctness 
>> in a number of places.  As per my other email, 
>> -rangeOfComposedCharacterSequenceAtIndex: appears to give the wrong results 
>> in almost every nontrivial case, and is unfortunately one of the primitive 
>> methods for a lot of things.
>> 
>> I also note that a lot of the NSString method implementations are not well 
>> optimised.  In a number of places, -characterAtIndex: is called repeatedly, 
>> when -getCharacters:range: is normally significantly more efficient.  The 
>> ICU UText interface provides something very similar to -getCharacters:range: 
>> as its primitive method (a callback that fills a buffer with UTF-16 
>> characters) and has some carefully optimised routines.
> 
> I’ve pushed my WIP changes to the newabi branch - review is welcome!  This 
> branch disables the GSString implementation of 
> -rangeOfComposedCharacterSequenceAtIndex: and falls back to the NSString one 
> (which is also wrong, but now consistently wrong).

I think your implementation of characterAtIndex: is wrong for the UTF8 case. 
Sadly things are more complicated. Please have a look at the (slow) way the old 
code is handling this.

- (unichar) characterAtIndex: (NSUInteger)index
{
  NSUInteger    l = 0;
  unichar       u;
  unichar       n = 0;
  unsigned      i = 0;

  while (i < nxcslen || n > 0)
    {
      u = nextUTF8((const uint8_t *)nxcsptr, nxcslen, &i, &n);
      if (l++ == index)
        {
          return u;
        }
    }

  [NSException raise: NSInvalidArgumentException
              format: @"-characterAtIndex: index out of range"];
  return 0;
}


reply via email to

[Prev in Thread] Current Thread [Next in Thread]