gnustep-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug in GNUstep implementation of NSRegularExpression?


From: Richard Frith-Macdonald
Subject: Re: Bug in GNUstep implementation of NSRegularExpression?
Date: Sat, 12 Apr 2014 05:56:52 +0100

On 11 Apr 2014, at 22:54, Fred Kiefer <address@hidden> wrote:

> On 08.04.2014 16:14, Mathias Bauer wrote:
>> Hi,
>> 
>> the following simple test program throws an exception:
>> 
>> 
>>> #import <Foundation/Foundation.h>
>>> 
>>> int main(int argc, const char * argv[])
>>> {
>>>    @autoreleasepool
>>>    {
>>>        NSString* text = @"h1. Real
>>> Acme\n\n||{noborder}{left}Item||{right}Price||\n|Testproduct|{right}2
>>> x $59.50|\n| |{right}net amount: $100.00|\n| |{right}total amount:
>>> $119.00|\n\n\nh2. Thanks for your purchase!\n\n\n";
>>> 
>>>        // NSRegularExpression* expr = [NSRegularExpression
>>> regularExpressionWithPattern:@".*?$"
>>> options:NSRegularExpressionAnchorsMatchLines error:NULL];
>>>        // int currentIndex = 27;
>>> 
>>>        NSRegularExpression* expr = [NSRegularExpression
>>> regularExpressionWithPattern:@"h[123]\\. "
>>> options:NSRegularExpressionCaseInsensitive error:NULL];
>>>        int currentIndex = 33;
>>> 
>>>        [expr firstMatchInString:text options:NSMatchingAnchored
>>> range:NSMakeRange(currentIndex, [text length]-currentIndex-1)];
>>>    }
>>>    return 0;
>>> }
>> 
>> The call to firstMatchInString will end up in calling uregex_lookingAt
>> (thus carrying out a regex match) and afterwards calling uregex_start
>> and uregext_end (thus retrieving the matched text range). The results of
>> the two latter calls will be used to create an NSRange object in the
>> prepareResult function of NSRegularExpression.m. And because the length
>> of this range is negative, an exception is thrown.
>> 
>> Let's have a look at the data:
>> 
>> The matching region starts at position 33, it ends at the string end.
>> This region has been set at the regex by calling uregex_setRegion (in
>> the setupRegex function in NSRegularExpression.m).
>> 
>> According to the documentation, uregex_start should return the index in
>> the input string of the start of the text matched. In my book this
>> should be the position of the "h2" near the end of the string.
>> 
>> According to the documentation, uregex_end should return the index in
>> the input string of the position following the end of the text matched.
>> In my book that should be start + 4.
>> 
>> But I get back: 33 for start and 4 for end. That obviously can't work.
>> 
>> I can't believe that the ICU regex implementation (I'm using ICU4.8 on
>> Ubuntu 13.10 64Bit) is broken to this extent, so probably the
>> NSRegularExpression implementation uses it incorrectly. But OTOH I can't
>> spot an obvious error.
>> 
>> Any hints would be greatly appreciated.
> 
> No hint, just some feedback. I was able to reproduce you problem on my
> GNUstep installation but completely failed to understand why uregex
> comes up with 4 as the result of uregex_end.

I spent all night looking at this ... without a lot of success.

I think this really is an ICU bug; I can consistently reproduce the problem and 
what I *think* is happeneing is that the call to uregex_lookingAt() is simply 
not working properly in this situation ... it's not honoring the range which 
was set for the regular expression, but neither is it ignoring it.

It seems to me that:
It's starting the matching at the start of the string rather than at the start 
of the range.
So it actually matches the 'h1. ' right at the start of the string.
It's then reporting the start index as if it had matched in the range, and 
since the range starts at 33 it reports a match at offset 0 in the string as 
being at index 33.
But it's reporting the end index as if it had matched in the whole string 
(which it did) as 4.

That looks very broken, and despite spending ages looking at the ICU 
documentation etc, I have been unable to find any indication that there's any 
misinterpretation of the way it *should* be working.

When I tried the same test on OSX, it didn't raise an exception, but neither 
did it work ... on OSX the behavior was to return a nil object for the match.  
I modified base to do the same as OSX, but I think probably a bug report to the 
ICU project is the way to go as I can't see any way of working around it.  I 
did consider extracting the substring from the range and then using ICU to 
match that substring (avoiding ICUs range code), but that would not work if the 
NSMatchingWithTransparentBounds is used, so it would only be a partial 
workaround.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]