[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Regexp cleanup
From: |
PhilipNienhuis |
Subject: |
Re: Regexp cleanup |
Date: |
Wed, 3 Jul 2013 12:57:23 -0700 (PDT) |
Rik-4 wrote
> 7/3/13
>
> All,
>
> Does anyone know if the following expression is legal in Matlab?
>
> [S, E, TE, M, T, NM, SP] = regexp ("John Davis\nRogers, James",
> '(?
> <first>
> \w+)\s+(?
> <last>
> \w+)|(?
> <last>
> \w+),\s+(?
> <first>
> \w+)')
>
> The issue is with the repeated use of a named capture buffer across an
> alternation operator. PCRE, which we use underneath for regular
> expressions, does not support non-unique capture names in a pattern.
> Octave currently works around this by renaming the capture buffers.
> However, the logic at the far end to parse the output of PCRE and return
> results to Octave is very complex and creaky. I re-wrote the back end
> routine in util/regexp.cc and I can now, at least, follow what the code is
> doing. The re-write also solves the following existing bugs (I said it
> was
> creaky).
>
> 38778: wrong return value for regexp
> 38616: memory leak
> 38149: wrong tokens returned
>
> So, depending on what Matlab does, would it be okay to drop support for
> this esoterica? I'm pretty tired of trying to work it out at this point.
>
> --Rik
Matlab r2013b prerelease does (after changing double quote to single quote,
and removing empty lines):
>> [S, E, TE, M, T, NM, SP] = regexp ('John Davis\nRogers, James',
>> '(?<first>\w+)\s+(?<last>\w+)|(?<last>\w+),\s+(?<first>\w+)')
S =
1 12
E =
10 25
TE =
[2x2 double] [2x2 double]
M =
'John Davis' 'nRogers, James'
T =
{1x2 cell} {1x2 cell}
NM =
1x2 struct array with fields:
first
last
SP =
'' '\' ''
>>
...so it seems Matlab thinks this is valid.
Philip
--
View this message in context:
http://octave.1599824.n4.nabble.com/Regexp-cleanup-tp4655163p4655172.html
Sent from the Octave - Maintainers mailing list archive at Nabble.com.