[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: What is wrong with a regex?
From: |
Koichi Murase |
Subject: |
Re: What is wrong with a regex? |
Date: |
Sat, 4 Feb 2023 14:20:20 +0900 |
2023年2月4日(土) 14:06 Leonid Isaev <leonid.isaev@ifax.com>:
> On Fri, Feb 03, 2023 at 10:08:29PM -0600, Dennis Williamson wrote:
> > Bash (and grep) don't allow an empty subexpression.
> >
> > f=foo; [[ $f =~ (|o) ]]; echo $?; echo "${BASH_REMATCH}"
> > 2
> >
> > $ echo foo | grep -E '(|o)'
> > grep: empty (sub)expression
>
> [...]
>
> I-orca--05:02-~-> f=foo; [[ $f =~ (|o) ]]; echo $?
> 0
>
> I-orca--05:03-~-> grep -E "(|o)" <<< foo
> foo
>
> [...]
> WTF?
Bash relies on the system library <regex.h> for the regular
expressions, so I guess the Bash version is not so much related to
this behavior difference.
I've checked the standard:
>From POSIX XCU 9.5.3
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_05_03
> /* --------------------------------------------
> Extended Regular Expression
> --------------------------------------------
> */
> extended_reg_exp : ERE_branch
> | extended_reg_exp '|' ERE_branch
> ;
> ERE_branch : ERE_expression
> | ERE_branch ERE_expression
> ;
> ERE_expression : one_char_or_coll_elem_ERE
> | '^'
> | '$'
> | '(' extended_reg_exp ')'
> | ERE_expression ERE_dupl_symbol
> ;
> one_char_or_coll_elem_ERE : ORD_CHAR
> | QUOTED_CHAR
> | '.'
> | bracket_expression
> ;
> ERE_dupl_symbol : '*'
> | '+'
> | '?'
> | '{' DUP_COUNT '}'
> | '{' DUP_COUNT ',' '}'
> | '{' DUP_COUNT ',' DUP_COUNT '}'
> ;
According to the standard, `|' connects one or more <ERE_branch>es,
<ERE_branch> is a sequence of one or more <ERE_expression>s, and
<ERE_expression> seems to require at least one element. This means
that (|_x) is not supported by the POSIX ERE, and what we see with GNU
grep and Bash regular expressions with Glibc <regex.h> is an
extension.
I also checked the behavior of Cygwin, where <regex.h> seems to be
implemented as a part of Newlib. Newlib <regex.h> also seems to
support an empty <ERE_branch> and thus (|_x).
- What is wrong with a regex?, Peng Yu, 2023/02/03
- Re: What is wrong with a regex?, Koichi Murase, 2023/02/03
- Re: What is wrong with a regex?, Dennis Williamson, 2023/02/03
- Re: What is wrong with a regex?, Leonid Isaev, 2023/02/04
- Re: What is wrong with a regex?,
Koichi Murase <=
- Re: What is wrong with a regex?, Koichi Murase, 2023/02/04
- Re: What is wrong with a regex?, Jeffrey Walton, 2023/02/04
- Re: What is wrong with a regex?, Koichi Murase, 2023/02/04
- Re: What is wrong with a regex?, Leonid Isaev, 2023/02/04
Re: What is wrong with a regex?, Peng Yu, 2023/02/03
- Re: What is wrong with a regex?, Koichi Murase, 2023/02/03
- Re: What is wrong with a regex?, Leonid Isaev (ifax), 2023/02/04
- Re: What is wrong with a regex?, Peng Yu, 2023/02/04
- Re: What is wrong with a regex?, alex xmb ratchev, 2023/02/04
- Re: What is wrong with a regex?, Kerin Millar, 2023/02/04