help-smalltalk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-smalltalk] [bug] (a RegexResults) at: n throws error if captur


From: Sergio
Subject: Re: [Help-smalltalk] [bug] (a RegexResults) at: n throws error if captured at n is empty string
Date: Mon, 09 Dec 2013 03:22:01 +0400

Hi

07.12.2013, 22:10, "Holger Hans Peter Freyther" <address@hidden>:
> On Fri, Dec 06, 2013 at 03:47:24PM -0700, sergio wrote:
>  data := 'word1 "" word3'.
>  (data =~ '(\w+) "([^"]*)" (\w+)') printString
>
> is already enough to re-produce the issue. For some reason registers
> contains three intervals and the second one is an empty one.

Yes, they should contain three. In bug report I said inaccurately, sorry. This 
regex should catch three pieces of the string, just for example. 
There are no any problems with 1st and 3rd ones, and the 2nd is also, I 
believe, captured as expected. But we can not access any 
captured fragment, if it is an empty string - attempt to access leads to an 
error throwing...
If another input string is passed for example: 'word1 "word2" word3', regex 
will catch and return substrings: 'word1', 'word2' (without double-quotes) and 
'word3'. That is o`key, and there is no throwing of errors on access in such 
case. Problem only with empty captured string.

As far as I understand, aString =~ operation returns subclass of RegexResults, 
if =~ "matches" than an instance of Kernel.MatchingRegexResults class is 
returned.
Kernel.MatchingRegexResults >> printOn: method defined there internally does:

(some source lines skipped)
1 to: self size
            do: 
                [:each | 
                aStream
                    nextPut: ch;
                    print: (self at: each).
                ch := $,].

So "self at: " is called here too.

I tried to "guess a fix" for issue (half-blindly, as I am novice to Smalltalk 
and I did not look deeply into sources and those written-in-C parts called by 
regex-related methods), I had edited Kernel.MatchingRegexResults >> at: a bit.
Originally (for 3.2.5) method looks like:

    at: anIndex [
        <category: 'accessing'>
        | reg text |
        anIndex = 0 ifTrue: [^self match].
        cache isNil ifTrue: [cache := Array new: registers size].
        (cache at: anIndex) isNil 
            ifTrue: 
                [reg := registers at: anIndex.
                text := reg isNil 
                            ifTrue: [nil]
                            ifFalse: [self subject copyFrom: reg first to: reg 
last].
                cache at: anIndex put: text].
        ^cache at: anIndex
    ]

I had changed it to be:

    at: anIndex [
        <category: 'accessing'>
        | reg text |
        anIndex = 0 ifTrue: [^self match].
        cache isNil ifTrue: [cache := Array new: registers size].
        (cache at: anIndex) isNil 
            ifTrue: 
                [reg := registers at: anIndex.
                text := reg isNil 
                            ifTrue: [nil]
                            ifFalse: [
                                reg isEmpty      " <<< changed here "
                                        ifFalse: [self subject copyFrom: reg 
first to: reg last]
                                        ifTrue: [ '' ] ].
                cache at: anIndex put: text].
        ^cache at: anIndex
    ]

It seems, after this change an issue has gone - I can access empty sub-matches.



By the way, another issue I've found - method String >> escapeRegex does not 
escape curly braces {}, while I suppose it should do.
(Unfortunately http://smalltalk.gnu.org/node/add/project-issue server is 
terribly slow and most of times I got just "service unavailable" trying to 
submit issue during last three days or so.)

Excuse me for slow response.

P.S. 
I have received your last message now, I see you are offering similar fix, but 
as I understand, your code will return nil in case of empty-string-match (?). I 
suppose it should return empty string, isn't it?
I'll try to apply your patch, if I succeed in that I'll let you know (tomorrow).

Sorry for my english also. :)

-- 
/sergio



reply via email to

[Prev in Thread] Current Thread [Next in Thread]