[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug in GNUstep implementation of NSRegularExpression?
From: |
Fred Kiefer |
Subject: |
Re: Bug in GNUstep implementation of NSRegularExpression? |
Date: |
Sat, 12 Apr 2014 09:33:15 +0200 |
Am 12.04.2014 um 06:56 schrieb Richard Frith-Macdonald <address@hidden>:
>
>> On 11 Apr 2014, at 22:54, Fred Kiefer <address@hidden> wrote:
>>
>>> On 08.04.2014 16:14, Mathias Bauer wrote:
>>> Hi,
>>>
>>> the following simple test program throws an exception:
>>>
>>>
>>>> #import <Foundation/Foundation.h>
>>>>
>>>> int main(int argc, const char * argv[])
>>>> {
>>>> @autoreleasepool
>>>> {
>>>> NSString* text = @"h1. Real
>>>> Acme\n\n||{noborder}{left}Item||{right}Price||\n|Testproduct|{right}2
>>>> x $59.50|\n| |{right}net amount: $100.00|\n| |{right}total amount:
>>>> $119.00|\n\n\nh2. Thanks for your purchase!\n\n\n";
>>>>
>>>> // NSRegularExpression* expr = [NSRegularExpression
>>>> regularExpressionWithPattern:@".*?$"
>>>> options:NSRegularExpressionAnchorsMatchLines error:NULL];
>>>> // int currentIndex = 27;
>>>>
>>>> NSRegularExpression* expr = [NSRegularExpression
>>>> regularExpressionWithPattern:@"h[123]\\. "
>>>> options:NSRegularExpressionCaseInsensitive error:NULL];
>>>> int currentIndex = 33;
>>>>
>>>> [expr firstMatchInString:text options:NSMatchingAnchored
>>>> range:NSMakeRange(currentIndex, [text length]-currentIndex-1)];
>>>> }
>>>> return 0;
>>>> }
>>>
>>> The call to firstMatchInString will end up in calling uregex_lookingAt
>>> (thus carrying out a regex match) and afterwards calling uregex_start
>>> and uregext_end (thus retrieving the matched text range). The results of
>>> the two latter calls will be used to create an NSRange object in the
>>> prepareResult function of NSRegularExpression.m. And because the length
>>> of this range is negative, an exception is thrown.
>>>
>>> Let's have a look at the data:
>>>
>>> The matching region starts at position 33, it ends at the string end.
>>> This region has been set at the regex by calling uregex_setRegion (in
>>> the setupRegex function in NSRegularExpression.m).
>>>
>>> According to the documentation, uregex_start should return the index in
>>> the input string of the start of the text matched. In my book this
>>> should be the position of the "h2" near the end of the string.
>>>
>>> According to the documentation, uregex_end should return the index in
>>> the input string of the position following the end of the text matched.
>>> In my book that should be start + 4.
>>>
>>> But I get back: 33 for start and 4 for end. That obviously can't work.
>>>
>>> I can't believe that the ICU regex implementation (I'm using ICU4.8 on
>>> Ubuntu 13.10 64Bit) is broken to this extent, so probably the
>>> NSRegularExpression implementation uses it incorrectly. But OTOH I can't
>>> spot an obvious error.
>>>
>>> Any hints would be greatly appreciated.
>>
>> No hint, just some feedback. I was able to reproduce you problem on my
>> GNUstep installation but completely failed to understand why uregex
>> comes up with 4 as the result of uregex_end.
>
> I spent all night looking at this ... without a lot of success.
>
> I think this really is an ICU bug; I can consistently reproduce the problem
> and what I *think* is happeneing is that the call to uregex_lookingAt() is
> simply not working properly in this situation ... it's not honoring the range
> which was set for the regular expression, but neither is it ignoring it.
>
> It seems to me that:
> It's starting the matching at the start of the string rather than at the
> start of the range.
> So it actually matches the 'h1. ' right at the start of the string.
> It's then reporting the start index as if it had matched in the range, and
> since the range starts at 33 it reports a match at offset 0 in the string as
> being at index 33.
> But it's reporting the end index as if it had matched in the whole string
> (which it did) as 4.
After more thinking, I reached the same conclusion.
> That looks very broken, and despite spending ages looking at the ICU
> documentation etc, I have been unable to find any indication that there's any
> misinterpretation of the way it *should* be working.
>
> When I tried the same test on OSX, it didn't raise an exception, but neither
> did it work ... on OSX the behavior was to return a nil object for the match.
OSX might even be correct here. An anchored search should only give a result if
the expression gets matched at the start of the range and if I counted
correctly ther is no h at position 33.
> I modified base to do the same as OSX, but I think probably a bug report to
> the ICU project is the way to go as I can't see any way of working around it.
> I did consider extracting the substring from the range and then using ICU to
> match that substring (avoiding ICUs range code), but that would not work if
> the NSMatchingWithTransparentBounds is used, so it would only be a partial
> workaround.