[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte
From: |
Mike Miller |
Subject: |
[Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters |
Date: |
Sun, 28 Jul 2019 19:30:28 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36 |
Follow-up Comment #14, bug #35910 (project octave):
Ok, I can now also reproduce this on my full development system, not just in a
container, but only with octave-cli.
So to repeat in a more reproducible way, here is regexp_error.m:
c = regexp ('lorem ipsum', '^\s*(⇒|=>|⊣|-\|)')
And here are three examples, showing that it works without error in
interactive Octave with LANG set to include UTF-8, but errors when locale
variables are not set or when running in batch mode from the command line:
$ octave-cli-6.0.0 -q
>> regexp_error
c = [](1x0)
>>
$ env -u LANG octave-cli-6.0.0 -q
>> regexp_error
error: regexp: unrecognized character after (? or (?- at position 13 of
expression
error: called from
regexp_error at line 1 column 3
>>
$ octave-cli-6.0.0 -q regexp_error.m
error: regexp: unrecognized character after (? or (?- at position 13 of
expression
error: called from
regexp_error at line 1 column 3
When octave-gui is used, these errors are not raised. So there is possibly
some locale initialization that happens in the Qt framework as part of the
octave-gui executable that is missing from octave-cli.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?35910>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters, Markus Mützel, 2019/07/21
- [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters, Mike Miller, 2019/07/21
- [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters, Markus Mützel, 2019/07/22
- [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters, Rik, 2019/07/22
- [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters, Mike Miller, 2019/07/22
- [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters, Andrew Janke, 2019/07/27
- [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters, Mike Miller, 2019/07/28
- [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters,
Mike Miller <=
- [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters, Markus Mützel, 2019/07/29
- [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters, Mike Miller, 2019/07/29
- [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters, Mike Miller, 2019/07/29
- [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters, Mike Miller, 2019/07/31
- [Octave-bug-tracker] [bug #35910] Incorrect regex matching of multi-byte UTF-8 characters, Rik, 2019/07/31