help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

improvement


From: Nathan Moore
Subject: improvement
Date: Wed, 31 Aug 2005 21:45:44 -0400
User-agent: Mozilla Thunderbird 1.0.6 (X11/20050716)

Hi,
I have recently been using flex (2.5.4) for a C type language. I looked through the output and noticed that:

"+"   |
"-"   |
"!"      { SINGLE_CHAR_OP_ACTION }

will produce 2 fall through cases in the big switch/case, while

"+"|"-"|"!"   { SINGLE_CHAR_OP_ACTION }

will only produce one which is obviously better. It will be smaller code without any losses in performance, which can't help put improve performance of the generated lexer b/c it will be more cache/swap friendly. (right?) This would not be an issue, except that I don't think it is possible for a compiler to optimize away the positions in the jump table for the switch/case b/c the cases coming out of tables confuse it and it would be very reluctant to
go changing values in the tables anyway.

While making flex recognize that the above examples are the same would make the best performing lexer have
a more readable specification, it would also make

<X>foo   |
bar            |
<Y>"cookie monster" { WHATEVER }

come out better as well, and these cannot be transformed into a one liner (can they?). But either way, it would
be more readable to keep them on separate lines.
These could also improve equivalence class shrinkage (if I understand it right) b/c it would make it clear to flex that in the "+"|"-"|"!" example, if those chars were not used elsewhere, that they could be in an equivalence class. This improvement would probably also propagate into other consolidating actions. (based on the current output, I'm assuming that FLEX is not the best at determining that certain states are equivalent.)

Another improvement would be if FLEX could tell which rules are responsible for another rule not being able to accept any tokens. I really haven't applied my brain to this one that hard (though I have applied my brain to trying to figure out which one was blocking it) so I'm not sure how hard it would be figure this out, but I think it should be
pretty easy.
I ended up having to chop my specification down to just a few entries that I suspected might cause the problem and remove/swap them until I found it. Of course these were the hardest to read regexes in the whole specification, so
debugging them was a bitch and a half.

Another improvement would be to have more than one action possible for each match.
I can't think up a good syntax for this, but with output like:

switch(matched_rule | flag) /* flag tells which of the multiple action to do */
{
   case 0: DO_SOMETHING;  YY_BREAK;
   case 0|FLAG: DO_SOMETHING_ELSE;  YY_BREAK;
...
}
In the above, the FLAG might be one, so flag would either be 0 or 1, and matched_rule (not the right name for it, I know) would be << 1 what it would have been without this
feature.  Or you could use a high bit for it (1<<(ffs(FINAL_STATES)+1)).
This would save user code from having to do extra branches which could be moved up into the branch that flex does anyway. It would also allow for the same action to be taken on some matched rules no matter what flag was by just generating a fall through or set an optional prologue to the main action without that much of a performance hit (needs goto and label in action code which the C compiler would probably eat and turn into a fall through).

Nathan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]