[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-smalltalk] [bug] (a RegexResults) at: n throws error if captur
From: |
Sergio |
Subject: |
Re: [Help-smalltalk] [bug] (a RegexResults) at: n throws error if captured at n is empty string |
Date: |
Mon, 09 Dec 2013 03:22:01 +0400 |
Hi
07.12.2013, 22:10, "Holger Hans Peter Freyther" <address@hidden>:
> On Fri, Dec 06, 2013 at 03:47:24PM -0700, sergio wrote:
> data := 'word1 "" word3'.
> (data =~ '(\w+) "([^"]*)" (\w+)') printString
>
> is already enough to re-produce the issue. For some reason registers
> contains three intervals and the second one is an empty one.
Yes, they should contain three. In bug report I said inaccurately, sorry. This
regex should catch three pieces of the string, just for example.
There are no any problems with 1st and 3rd ones, and the 2nd is also, I
believe, captured as expected. But we can not access any
captured fragment, if it is an empty string - attempt to access leads to an
error throwing...
If another input string is passed for example: 'word1 "word2" word3', regex
will catch and return substrings: 'word1', 'word2' (without double-quotes) and
'word3'. That is o`key, and there is no throwing of errors on access in such
case. Problem only with empty captured string.
As far as I understand, aString =~ operation returns subclass of RegexResults,
if =~ "matches" than an instance of Kernel.MatchingRegexResults class is
returned.
Kernel.MatchingRegexResults >> printOn: method defined there internally does:
(some source lines skipped)
1 to: self size
do:
[:each |
aStream
nextPut: ch;
print: (self at: each).
ch := $,].
So "self at: " is called here too.
I tried to "guess a fix" for issue (half-blindly, as I am novice to Smalltalk
and I did not look deeply into sources and those written-in-C parts called by
regex-related methods), I had edited Kernel.MatchingRegexResults >> at: a bit.
Originally (for 3.2.5) method looks like:
at: anIndex [
<category: 'accessing'>
| reg text |
anIndex = 0 ifTrue: [^self match].
cache isNil ifTrue: [cache := Array new: registers size].
(cache at: anIndex) isNil
ifTrue:
[reg := registers at: anIndex.
text := reg isNil
ifTrue: [nil]
ifFalse: [self subject copyFrom: reg first to: reg
last].
cache at: anIndex put: text].
^cache at: anIndex
]
I had changed it to be:
at: anIndex [
<category: 'accessing'>
| reg text |
anIndex = 0 ifTrue: [^self match].
cache isNil ifTrue: [cache := Array new: registers size].
(cache at: anIndex) isNil
ifTrue:
[reg := registers at: anIndex.
text := reg isNil
ifTrue: [nil]
ifFalse: [
reg isEmpty " <<< changed here "
ifFalse: [self subject copyFrom: reg
first to: reg last]
ifTrue: [ '' ] ].
cache at: anIndex put: text].
^cache at: anIndex
]
It seems, after this change an issue has gone - I can access empty sub-matches.
By the way, another issue I've found - method String >> escapeRegex does not
escape curly braces {}, while I suppose it should do.
(Unfortunately http://smalltalk.gnu.org/node/add/project-issue server is
terribly slow and most of times I got just "service unavailable" trying to
submit issue during last three days or so.)
Excuse me for slow response.
P.S.
I have received your last message now, I see you are offering similar fix, but
as I understand, your code will return nil in case of empty-string-match (?). I
suppose it should return empty string, isn't it?
I'll try to apply your patch, if I succeed in that I'll let you know (tomorrow).
Sorry for my english also. :)
--
/sergio