[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ECMAScript: Automatic Semicolon Insertion
From: |
Ron Burk |
Subject: |
Re: ECMAScript: Automatic Semicolon Insertion |
Date: |
Wed, 7 Dec 2016 16:28:37 -0800 |
Top-of-head musings that are guaranteed to be confused, incomplete, and
wrong:
Seems like, roughly speaking, semicolons are optional, but if not present
must be replaced by a newline. The primary complication being we otherwise
want to ignore newlines. So, if I augment the token value to include a "is
preceded by newline" flag (which the lexer must obligingly set), does that
provide enough information to handle the basic problem? Something along the
lines of:
SemiOpt
: ';'
/* expand test to handle '}' and EOF? */
| { if(!yylval.PrecededByNewLine) Error("Missing semi-colon or
newline!"); }
;
ExprStmt:
: Expr SemiOpt
;
VarStmt
: 'var' <other stuff> SemiOpt
;
....
Given this wording of even more outside-the-grammar rules:
The practical effect of these restricted productions is as follows:
When a *++* or *--* token is encountered where the parser would treat it as
a postfix operator, and at least one *LineTerminator* occurred between the
preceding token and the *++* or *--* token, then a semicolon is
automatically inserted before the *++* or *--* token.
When a *continue*, *break*, *return*, or *throw* token is encountered and a
*LineTerminator* is encountered before the next token, a semicolon is
automatically inserted after the *continue*, *break*, *return*, or *throw*
token.
My first thought is:
1. Have the lexer return a distinct token type for '++' or '--' when the
previous token was a LineTerminator
2. Grammar positions for postfix '++' or '--' would require the "normal"
token
3. Grammar positions for prefix '++' or '--' would accept either the
"normal" token or the alternate type that indicated preceding LineTerminator
4. Have the lexer likewise return distinct token types for
continue/break/return/throw if followed by LineTerminator
5. Split each of these statements into two rules in the grammar, one
that can reduce immediately, the other that expects further input
On Wed, Dec 7, 2016 at 1:03 AM, Simon Richter <address@hidden>
wrote:
> Hi,
>
> On Tue, Dec 06, 2016 at 10:52:06PM -0500, Ricky wrote:
>
> > Your syntax implies that [\n] should be treated as [;]. So why not use
> [\n] as alternative?
>
> Unfortunately, it's not that easy.
>
> var foo = 4
> + 5
>
> is also allowed. This can lead to interesting hidden bugs[1], but is valid.
> Automatic Semicolon Insertion[2] is defined as a last-resort mechanism
> during
> parsing.
>
> I'm currently trying to inline rules, e.g. creating rules like
>
> variable_statement:
> "var" variable_declaration_list ";" |
> "var" variable_declaration_list after_variable_statement;
>
> after_variable_statement:
> variable_statement |
> throw_statement |
> ...
>
> listing the alternatives that imply a semicolon there, but this doesn't
> account for closing a block with a right brace.
>
> The irony is that the parser does almost the right thing -- it is stuck in
> the state
>
> member_expression:
> primary_expression . |
> member_expression . "." identifier |
> member_expression . "[" assignment_expression_in "]" | ...
>
> where it needs a lookahead token to decide between reducing to
> primary_expression and shifting -- and the table lookup here gives the
> error. I've also tried
>
> member_expression:
> primary_expression |
> primary_expression "var" { YYBACKUP(KW_VAR, YYSTYPE()); } |
> ...
>
> which doesn't work, because I cannot back up there, and
>
> member_expression:
> primary_expression |
> primary_expression error |
> ...
>
> which discards the "var" token.
>
> Simon
>
> [1] https://github.com/GyrosGeier/test262/commit/
> e9a33d61ac725b4b353a7a20857f04c7d34fed3d
> [2] https://es5.github.io/#x7.9
>
> _______________________________________________
> address@hidden https://lists.gnu.org/mailman/listinfo/help-bison
>