help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Which lexer do people use?


From: Akim Demaille
Subject: Re: Which lexer do people use?
Date: Mon, 6 Jul 2020 07:01:37 +0200

Hi Christian,

> Le 4 juil. 2020 à 12:46, Christian Schoenebeck <schoenebeck@crudebyte.com> a 
> écrit :
> 
> On Samstag, 4. Juli 2020 08:14:46 CEST Akim Demaille wrote:
> 
> For me, the exaggerated 'divide and conquer' philosophy applied decades ago 
> by 
> splitting scanner and parser was a much more painful decision with clearly 
> perceivable, negative consequences in real world for all users.

I agree.  I am happy to be able to test the scanner in isolation, but
compared to the cost of dealing with the "context" in the scanner by
hand, it's a meager benefit.


> AFAICS almost nobody is using anything else than Flex. Probably because its 
> designated task of handling type-3 grammars is already fully covered by just 
> having a correct RegEx implementation, and most of the examples, howtos, 
> books 
> and docs out there are based on Flex.

I agree.  The examples that ship with Bison have either a hand written
scanner (C, C++, D, Java), or a Flex generated one (C, C++).  I'd be
happy to replace one of the issues of Flex to demonstrate another tool.

> The only thing that people are missing once in a while on scanner side is 
> unicode support, but there are ways to circumvent that, as you barely need 
> unicode in the actual RegEx patterns.

Yup.
https://github.com/akimd/bison/blob/3e6e51cf5c932453ce5614865c5729abac15ec39/src/scan-gram.l#L163
But it's tedious.


> The obvious real improvement in future will be finally getting rid of a 
> separate scanner for good in the first place, combining the two things which 
> actually belonged together from day one: having the scanner functionality 
> directly in Bison instead, and saying goodbye to all those scanner state 
> stack 
> hacks which often end up in a huge mess that people can hardly read, and 
> often 
> lead to severe misbehaviours on edge cases of certain inputs.

+1.

> Akim, was there any progress in the IP discussion for that to become possible 
> one day or is that previously discussed merge off the table?

I can't comment too much about that.  It is still on the todo list (i.e., 
nothing
came to compromise this), but other matters have drawn too much attention.

I'm starting to see the end of the other features I meant to put into Bison 3
before moving to 4.  I believe Bison 3.8 will be about multiple start conditions
and the full rewrite of glr.cc, and hopefully 3.9 should be about eliminating
chain rules.

multistart is actually in good shape.  My running example looks like this
(https://github.com/akimd/bison/blob/2dbe3a005c0065bef5db54bc1da56c24dc0880d5/examples/c/lexcalc/parse.y#L63):

> %type <int> exp expression line
> 
> %start input expression
> 
> %%
> input:
>   %empty
> | input line
> ;
> 
> expression:
>   exp EOL  { $$ = $exp; }
> ;
> 
> ...
> %%
> ...
> 
>   if (...)
>     {
>       yyparse_expression_t res = yyparse_expression ();
>       if (res.yystatus == 0)
>         printf ("expression: %d\n", res.yyvalue);
>       else
>         printf ("expression: failure\n");
>     }
>   else
>     yyparse_input ();

Cheers!


reply via email to

[Prev in Thread] Current Thread [Next in Thread]