[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: help with pattern matching needed
From: |
Lawrence Velázquez |
Subject: |
Re: help with pattern matching needed |
Date: |
Fri, 07 Jan 2022 02:20:12 -0500 |
User-agent: |
Cyrus-JMAP/3.5.0-alpha0-4526-gbc24f4957e-fm-20220105.001-gbc24f495 |
On Fri, Jan 7, 2022, at 12:28 AM, Christoph Anton Mitterer wrote:
> When I have:
> case "${character}" in
> (pattern)
> foo
> ;;
> (*)
> bar
> esac
>
> and I want to match different patterns, than AFAIU, the pattern
> undergoes quote removal, first right?
Chet described it this way in August:
It's tricky in the sense that quote removal, according to the
strict shell definition, is performed -- the literal quote
characters are removed and don't appear in the expanded pattern.
The complication is that the shell and pattern matcher have to
arrange for the quoted characters to be marked appropriately
for the pattern matcher itself, so quoted special characters
lose their special meaning and match themselves.
https://lists.gnu.org/archive/html/bug-bash/2021-08/msg00187.html
Roughly speaking, shell-quoted characters are already treated as
literals in the pattern. There isn't the rigid separation you have
to deal with when you're constructing a pattern for, say, grep(1).
> Case 1:
> *******
> A string like:
> \\
> would end up a s the pattern
> \
> which by itself is apparently already taken as the literal \ ?!
>
> Why don't I have to quote that again (for the pattern)?
>
> [...]
>
> Instead, using:
> \\\\
> or
> '\\'
> doesn't produce what I'd expect by the above, but matches the
> literal \\ .
The quoted backslashes are already literals in the pattern.
> Case 2:
> *******
> A string like:
> [.*^$[\]
> should end up (after quote removal) as the pattern:
> [.*^$[]
> an AFAIU be valid (but of course not match the literal \), but bash
> complains about a missing matching ].
> That construct executes e.g. in (d)ash, though it never matches (or at
> least not with the plain single characters).
It looks like \] is being treated as a literal ] in both cases.
The difference seems to be in the parsing: dash gives up on the
bracket expression, while bash consumes the rest of the script
trying to close it.
% cat ex1.sh
case $1 in
[.*^$[\]) printf '%s matched\n' "$1" ;;
*) printf "%s didn't match\\n" "$1" ;;
esac
% bash ex1.sh .
ex1.sh: line 2: unexpected EOF while looking for matching `]'
ex1.sh: line 5: syntax error: unexpected end of file
% dash ex1.sh .
. didn't match
% dash ex1.sh '[.xxxxxxxxxxxx^$[]'
[.xxxxxxxxxxxx^$[] matched
> Case 3:
> *******
> When I however remove the \ the string:
> [.*^$[]
> should also end up as the pattern:
> [.*^$[]
> an AFAIU be valid, but while that executes, neither of these characters
> match and I always end up in the (*) case.
>
> Any ideas why?
>
> I stumbled over the old $[] form of arithmetic expression, but I guess
> that cannot be it, cause then one would again have a missing matching
> ].
That is it, though. The empty $[] substitutes "0" into the pattern.
% cat ex2.sh
case $1 in
[.*^$[]) printf '%s matched\n' "$1" ;;
*) printf "%s didn't match\\n" "$1" ;;
esac
% bash ex2.sh '[.xxxxxxx^0'
[.xxxxxxx^0 matched
However, as you noted, bash doesn't parse quite so greedily this
time. I don't know why not.
> Case 4:
> *******
> Using the string:
> ['.*^$[\']
> which should end up as the pattern:
> [.*^$[\]
> works as expected, and \ which inside a bracket expression has no
> special meaning is also matched as it should
The ] is no longer quoted (since \ is itself quoted), so bash
successfully closes the bracket expression.
> Same for the string:
> ['.*^$[']
> ends up as the pattern:
> [.*^$[]
> and matches as it should (without the literal \)
You've broken up $[], and ] is unquoted, so bash successfully closes
the bracket expression.
--
vq