help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] Does the parser backtrack?


From: Eric Blake
Subject: Re: [Help-bash] Does the parser backtrack?
Date: Tue, 4 Oct 2016 14:33:03 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0

On 10/04/2016 09:22 AM, Daniel Martí wrote:
> Telling the difference between $(( - arithmetic expansion - and $( ( -
> subshell inside a command substitution - should be easy:

No, it is absolutely hard, because it is context-dependent.

POSIX itself says, at
http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xcu_chap02.html
C.2.6 Command Substitution:

"Arithmetic expansions have precedence over command substitutions. That
is, if the shell can parse an expansion beginning with "$((" as an
arithmetic expansion then it will do so. It will only parse the
expansion as a command substitution (that starts with a subshell) if it
determines that it cannot parse the expansion as an arithmetic expansion."
...
"The ambiguity is not restricted to the simple case of a single
subshell. More complicated ambiguous cases are possible (even with just
the standard shell syntax), such as:

$(( cat <<EOH
+ ( (
EOH
) && ( cat <<EOH
) ) + 1 +
EOH
))

This can be parsed as an arithmetic expansion, with cat and EOH as the
names of shell variables. Ambiguous cases also exist where the end of
the expansion is at a different location for the arithmetic expansion
and the command substitution:

$((cat <<EOF
+((((
EOF
) && (
cat <<EOF
+
EOF
))

This is an incomplete arithmetic expansion, but would have been a
(complete) command substitution if it could not have been parsed as an
arithmetic expansion. If this expansion occurs at the end of input then
the shell reports a syntax error; it does not parse it as a command
substitution."

So lets try it out:

$ script='echo $(( cat <<EOH
> + ( (
> EOH
> ) && ( cat <<EOH
> ) ) + 1 +
> EOH
> ))
> '
$ bash -c "$script"
0
$ bash -c "cat=1 EOH=2; $script"
64

bash (and dash and ksh) unconditionally treat that as arithmetic.  But
if you tweak it slightly by adding a : at the end, the POSIX description
starts to differ from actual practice:

$ bash -c 'echo $(( cat <<EOH
+ ( (
EOH
) && ( cat <<EOH
) ) + 1 +
EOH
:))
'
bash: line 6: cat <<EOH
+ ( (
EOH
) && ( cat <<EOH
) ) + 1 +
EOH
:: syntax error in expression (error token is ":")

where the shells have decided that it is arithmetic with an error,
rather than attempting to fall back to a command substitution.  And to
show that it is a valid command substitution, force the issue with a
space up front:

$ bash -c 'echo $( ( cat <<EOH
+ ( (
EOH
) && ( cat <<EOH
) ) + 1 +
EOH
:))
'
+ ( ( ) ) + 1 +


> I'm especially confused by this because so far I've written an almost
> complete bash parser as a recursive descent parser without backtracking,
> i.e. treating bash as LL(k). But if indeed I need to backtrack to retry
> $(( as if it were $( (, the number of lookahead tokens is unbounded and
> I would be looking at LL(*). I wonder if bash was designed this way on
> purpose.

Sadly, because the parse IS context-dependent, there is no context-free
grammar that can parse it.

> 
> I've filed a bug on my side about this too:
> https://github.com/mvdan/sh/issues/30
> 
> Any help will be appreciated :)

Sorry to disappoint you, but shell is NOT an easy language to parse.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]