[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-bash] Posix: 2.3 Token Recognition & 2.10 Shell Grammar
From: |
Eric Blake |
Subject: |
Re: [Help-bash] Posix: 2.3 Token Recognition & 2.10 Shell Grammar |
Date: |
Tue, 14 Jul 2015 07:01:36 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 |
On 07/13/2015 09:12 AM, Michael Convey wrote:
> I've read these two sections -- more than once actually, but my
> understanding of them is still unsatisfactory. Here are the sources:
>
> - Token Recognition:
>
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03
> - Shell Grammar:
>
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_10
>
> I find the following shell grammar excerpt particularly confusing:
>
> "When a TOKEN is seen where one of those annotated productions could be
> used to reduce the symbol, the applicable rule shall be applied to convert
> the token identifier type of the TOKEN to a token identifier acceptable at
> that point in the grammar. The reduction shall then proceed based upon the
> token identifier type yielded by the rule applied."
>
> Is there a book or some other source that provides a layman's exposition
> of these two sections?
Not that I'm aware of. But I can at least give a layman's shot at
trying to explain the intent:
The shell allows:
case in in in ) echo yes;; esac
which means that the tokenizer cannot blindly treat 'in' as a keyword
everywhere, but only in the places where the keyword is expected (the
third token after seeing 'case' as the first token). So, reading the
grammar, we see (among others):
case_clause : Case WORD linebreak in linebreak case_list Esac
in : In /* Apply rule 6 */
%token In
/* 'in' */
6. [Third word of for and case]
a. [ case only]
When the TOKEN is exactly the reserved word in, the token identifier
for in shall result. Otherwise, the token WORD shall be returned.
So the parser has seen 'case' as Case, the first 'in' as WORD, and is
trying to determine whether the second 'in' fits the rules for
"case_clause". Initially, 'in' is classified as TOKEN, and we are at
the rule for the "in" production, which says to use rule 6 to
disambiguate the token. Rule 6 says that the string "in" is recognized
as a reserved word at this point of context, so the tokenizer
reclassifies from TOKEN to In, and the grammar then accepts the clause
as a valid sequence of tokens. If you do anything else, like:
case in \in in ) echo yes;; esac
you'll get "bash: syntax error near unexpected token `\in'". Or,
applying the same analysis as above, the "in" production applies Rule 6
to the TOKEN of '\in', but since it is not the literal string 'in', it
is not recognized as a reserved word, and is not reclassified, and
therefore the "case_clause" rule is not satisfied and you have a syntax
error.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature