gnustep-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug in GNUstep implementation of NSRegularExpression?


From: Fred Kiefer
Subject: Re: Bug in GNUstep implementation of NSRegularExpression?
Date: Sat, 12 Apr 2014 09:33:15 +0200

Am 12.04.2014 um 06:56 schrieb Richard Frith-Macdonald <address@hidden>:

> 
>> On 11 Apr 2014, at 22:54, Fred Kiefer <address@hidden> wrote:
>> 
>>> On 08.04.2014 16:14, Mathias Bauer wrote:
>>> Hi,
>>> 
>>> the following simple test program throws an exception:
>>> 
>>> 
>>>> #import <Foundation/Foundation.h>
>>>> 
>>>> int main(int argc, const char * argv[])
>>>> {
>>>>   @autoreleasepool
>>>>   {
>>>>       NSString* text = @"h1. Real
>>>> Acme\n\n||{noborder}{left}Item||{right}Price||\n|Testproduct|{right}2
>>>> x $59.50|\n| |{right}net amount: $100.00|\n| |{right}total amount:
>>>> $119.00|\n\n\nh2. Thanks for your purchase!\n\n\n";
>>>> 
>>>>       // NSRegularExpression* expr = [NSRegularExpression
>>>> regularExpressionWithPattern:@".*?$"
>>>> options:NSRegularExpressionAnchorsMatchLines error:NULL];
>>>>       // int currentIndex = 27;
>>>> 
>>>>       NSRegularExpression* expr = [NSRegularExpression
>>>> regularExpressionWithPattern:@"h[123]\\. "
>>>> options:NSRegularExpressionCaseInsensitive error:NULL];
>>>>       int currentIndex = 33;
>>>> 
>>>>       [expr firstMatchInString:text options:NSMatchingAnchored
>>>> range:NSMakeRange(currentIndex, [text length]-currentIndex-1)];
>>>>   }
>>>>   return 0;
>>>> }
>>> 
>>> The call to firstMatchInString will end up in calling uregex_lookingAt
>>> (thus carrying out a regex match) and afterwards calling uregex_start
>>> and uregext_end (thus retrieving the matched text range). The results of
>>> the two latter calls will be used to create an NSRange object in the
>>> prepareResult function of NSRegularExpression.m. And because the length
>>> of this range is negative, an exception is thrown.
>>> 
>>> Let's have a look at the data:
>>> 
>>> The matching region starts at position 33, it ends at the string end.
>>> This region has been set at the regex by calling uregex_setRegion (in
>>> the setupRegex function in NSRegularExpression.m).
>>> 
>>> According to the documentation, uregex_start should return the index in
>>> the input string of the start of the text matched. In my book this
>>> should be the position of the "h2" near the end of the string.
>>> 
>>> According to the documentation, uregex_end should return the index in
>>> the input string of the position following the end of the text matched.
>>> In my book that should be start + 4.
>>> 
>>> But I get back: 33 for start and 4 for end. That obviously can't work.
>>> 
>>> I can't believe that the ICU regex implementation (I'm using ICU4.8 on
>>> Ubuntu 13.10 64Bit) is broken to this extent, so probably the
>>> NSRegularExpression implementation uses it incorrectly. But OTOH I can't
>>> spot an obvious error.
>>> 
>>> Any hints would be greatly appreciated.
>> 
>> No hint, just some feedback. I was able to reproduce you problem on my
>> GNUstep installation but completely failed to understand why uregex
>> comes up with 4 as the result of uregex_end.
> 
> I spent all night looking at this ... without a lot of success.
> 
> I think this really is an ICU bug; I can consistently reproduce the problem 
> and what I *think* is happeneing is that the call to uregex_lookingAt() is 
> simply not working properly in this situation ... it's not honoring the range 
> which was set for the regular expression, but neither is it ignoring it.
> 
> It seems to me that:
> It's starting the matching at the start of the string rather than at the 
> start of the range.
> So it actually matches the 'h1. ' right at the start of the string.
> It's then reporting the start index as if it had matched in the range, and 
> since the range starts at 33 it reports a match at offset 0 in the string as 
> being at index 33.
> But it's reporting the end index as if it had matched in the whole string 
> (which it did) as 4.

After more thinking, I reached the same conclusion.


> That looks very broken, and despite spending ages looking at the ICU 
> documentation etc, I have been unable to find any indication that there's any 
> misinterpretation of the way it *should* be working.
> 
> When I tried the same test on OSX, it didn't raise an exception, but neither 
> did it work ... on OSX the behavior was to return a nil object for the match.

OSX might even be correct here. An anchored search should only give a result if 
the expression gets matched at the start of the range and if I counted 
correctly ther is no h at position 33.

>  I modified base to do the same as OSX, but I think probably a bug report to 
> the ICU project is the way to go as I can't see any way of working around it. 
>  I did consider extracting the substring from the range and then using ICU to 
> match that substring (avoiding ICUs range code), but that would not work if 
> the NSMatchingWithTransparentBounds is used, so it would only be a partial 
> workaround.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]