improvement

help-flex

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

improvement

From:	Nathan Moore
Subject:	improvement
Date:	Wed, 31 Aug 2005 21:45:44 -0400
User-agent:	Mozilla Thunderbird 1.0.6 (X11/20050716)

Hi,

I have recently been using flex (2.5.4) for a C type language. I lookedthrough the output and noticed that:


"+"   |
"-"   |
"!"      { SINGLE_CHAR_OP_ACTION }

will produce 2 fall through cases in the big switch/case, while

"+"|"-"|"!"   { SINGLE_CHAR_OP_ACTION }

will only produce one which is obviously better. It will be smallercode without any losses in performance,which can't help put improve performance of the generated lexer b/c itwill be more cache/swap friendly. (right?)This would not be an issue, except that I don't think it is possible fora compiler to optimize away the positions inthe jump table for the switch/case b/c the cases coming out of tablesconfuse it and it would be very reluctant to

go changing values in the tables anyway.

While making flex recognize that the above examples are the same wouldmake the best performing lexer have

a more readable specification, it would also make

<X>foo   |
bar            |
<Y>"cookie monster" { WHATEVER }

come out better as well, and these cannot be transformed into a oneliner (can they?). But either way, it would

be more readable to keep them on separate lines.

These could also improve equivalence class shrinkage (if I understand itright) b/c it would make it clear toflex that in the "+"|"-"|"!" example, if those chars were not usedelsewhere, that they could be in an equivalence class.This improvement would probably also propagate into other consolidatingactions.(based on the current output, I'm assuming that FLEX is not the best atdetermining that certain states are equivalent.)

Another improvement would be if FLEX could tell which rules areresponsible for another rule not being able toaccept any tokens. I really haven't applied my brain to this one thathard (though I have applied my brain to tryingto figure out which one was blocking it) so I'm not sure how hard itwould be figure this out, but I think it should be

pretty easy.

I ended up having to chop my specification down to just a few entriesthat I suspected might cause the problem andremove/swap them until I found it. Of course these were the hardest toread regexes in the whole specification, so

debugging them was a bitch and a half.

Another improvement would be to have more than one action possible foreach match.

I can't think up a good syntax for this, but with output like:

switch(matched_rule | flag) /* flag tells which of the multiple actionto do */

{
   case 0: DO_SOMETHING;  YY_BREAK;
   case 0|FLAG: DO_SOMETHING_ELSE;  YY_BREAK;
...
}

In the above, the FLAG might be one, so flag would either be 0 or 1, andmatched_rule(not the right name for it, I know) would be << 1 what it would havebeen without this

feature.  Or you could use a high bit for it (1<<(ffs(FINAL_STATES)+1)).

This would save user code from having to do extra branches which couldbe moved up intothe branch that flex does anyway. It would also allow for the sameaction to be taken onsome matched rules no matter what flag was by just generating a fallthrough or set an optionalprologue to the main action without that much of a performance hit(needs goto and label inaction code which the C compiler would probably eat and turn into a fallthrough).


Nathan

[Prev in Thread]

Current Thread

[Next in Thread]

improvement, Nathan Moore <=

Prev by Date: Replica for you
Previous by thread: 遅くなってごめんなさい。。。舞です
Index(es):
- Date
- Thread