[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: regexp strangeness
From: |
Daniel J Sebald |
Subject: |
Re: regexp strangeness |
Date: |
Sat, 8 Feb 2020 04:12:03 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 |
On 2/8/20 3:32 AM, Kay Nick wrote:
Hey all,
the documentation to regexp says:
'\w'
Match any word character
what exactly is a word character (maybe even more important what isn't)?
Am I right in assuming its
[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]? What about non
english characters like öäßłńŚ?
https://en.wikipedia.org/wiki/Regular_expression#Character_classes
lists \w as the equivalent to [A-Za-z0-9_]
Probably non-english won't handle this, but maybe you could try [ä-Ś] or
whatever makes sense for the alphabet of interest.
And here some other strange (to me) behavior:
regexp("#w#","#\w#")
ans = 1 <- seems to work in general...
As you point out two examples later, there is need of an escape. That
has nothing to do with the regexp() programming, but generally in Octave
double quotes are like the C printf syntax, i.e., escapes. Matlab
doesn't interpret double quotes. On the other hand, Octave treats
single quotes just the way that Matlab does.
So, in the above \w is an escape sequence, but probably one that isn't
defined so that \w ends up the same as w. So what you've done is
regexp("#w#","#w#"), which matches.
regexp("#d#","#\w#")
ans = [](1x0) <- why?
Because by the same logic as above, you've done regexp("#d#","#w#"),
which doesn't match.
regexp("#d#","#\\w#") <- so we need to double escape these
special characters... no mention of that in the help... :-(
ans = 1
regexp("#j#","#\\w#")
ans = 1 <- ok
regexp("#E#","#\\w#")
ans = 1 <- ok
regexp("#E#","#\\w*#")
ans = 1 <- ok
regexp("##","#\\w*#")
ans = 1 <- ok
regexp("#.#","#\\w*#")
ans = [](1x0) <- why?
Because . is not in [A-Za-z0-9_]
Dan
Especially the last one >> regexp("#.#","#\\w*#") ans = [](1x0) looks
like a bug to me. Or am I getting something wrong here?
Thanks
Kay