[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug in GNUstep implementation of NSRegularExpression?
From: |
Richard Frith-Macdonald |
Subject: |
Re: Bug in GNUstep implementation of NSRegularExpression? |
Date: |
Sat, 12 Apr 2014 05:56:52 +0100 |
On 11 Apr 2014, at 22:54, Fred Kiefer <address@hidden> wrote:
> On 08.04.2014 16:14, Mathias Bauer wrote:
>> Hi,
>>
>> the following simple test program throws an exception:
>>
>>
>>> #import <Foundation/Foundation.h>
>>>
>>> int main(int argc, const char * argv[])
>>> {
>>> @autoreleasepool
>>> {
>>> NSString* text = @"h1. Real
>>> Acme\n\n||{noborder}{left}Item||{right}Price||\n|Testproduct|{right}2
>>> x $59.50|\n| |{right}net amount: $100.00|\n| |{right}total amount:
>>> $119.00|\n\n\nh2. Thanks for your purchase!\n\n\n";
>>>
>>> // NSRegularExpression* expr = [NSRegularExpression
>>> regularExpressionWithPattern:@".*?$"
>>> options:NSRegularExpressionAnchorsMatchLines error:NULL];
>>> // int currentIndex = 27;
>>>
>>> NSRegularExpression* expr = [NSRegularExpression
>>> regularExpressionWithPattern:@"h[123]\\. "
>>> options:NSRegularExpressionCaseInsensitive error:NULL];
>>> int currentIndex = 33;
>>>
>>> [expr firstMatchInString:text options:NSMatchingAnchored
>>> range:NSMakeRange(currentIndex, [text length]-currentIndex-1)];
>>> }
>>> return 0;
>>> }
>>
>> The call to firstMatchInString will end up in calling uregex_lookingAt
>> (thus carrying out a regex match) and afterwards calling uregex_start
>> and uregext_end (thus retrieving the matched text range). The results of
>> the two latter calls will be used to create an NSRange object in the
>> prepareResult function of NSRegularExpression.m. And because the length
>> of this range is negative, an exception is thrown.
>>
>> Let's have a look at the data:
>>
>> The matching region starts at position 33, it ends at the string end.
>> This region has been set at the regex by calling uregex_setRegion (in
>> the setupRegex function in NSRegularExpression.m).
>>
>> According to the documentation, uregex_start should return the index in
>> the input string of the start of the text matched. In my book this
>> should be the position of the "h2" near the end of the string.
>>
>> According to the documentation, uregex_end should return the index in
>> the input string of the position following the end of the text matched.
>> In my book that should be start + 4.
>>
>> But I get back: 33 for start and 4 for end. That obviously can't work.
>>
>> I can't believe that the ICU regex implementation (I'm using ICU4.8 on
>> Ubuntu 13.10 64Bit) is broken to this extent, so probably the
>> NSRegularExpression implementation uses it incorrectly. But OTOH I can't
>> spot an obvious error.
>>
>> Any hints would be greatly appreciated.
>
> No hint, just some feedback. I was able to reproduce you problem on my
> GNUstep installation but completely failed to understand why uregex
> comes up with 4 as the result of uregex_end.
I spent all night looking at this ... without a lot of success.
I think this really is an ICU bug; I can consistently reproduce the problem and
what I *think* is happeneing is that the call to uregex_lookingAt() is simply
not working properly in this situation ... it's not honoring the range which
was set for the regular expression, but neither is it ignoring it.
It seems to me that:
It's starting the matching at the start of the string rather than at the start
of the range.
So it actually matches the 'h1. ' right at the start of the string.
It's then reporting the start index as if it had matched in the range, and
since the range starts at 33 it reports a match at offset 0 in the string as
being at index 33.
But it's reporting the end index as if it had matched in the whole string
(which it did) as 4.
That looks very broken, and despite spending ages looking at the ICU
documentation etc, I have been unable to find any indication that there's any
misinterpretation of the way it *should* be working.
When I tried the same test on OSX, it didn't raise an exception, but neither
did it work ... on OSX the behavior was to return a nil object for the match.
I modified base to do the same as OSX, but I think probably a bug report to the
ICU project is the way to go as I can't see any way of working around it. I
did consider extracting the substring from the range and then using ICU to
match that substring (avoiding ICUs range code), but that would not work if the
NSMatchingWithTransparentBounds is used, so it would only be a partial
workaround.