help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: %option header (Was Re: flex 2.5.13 released)


From: Bruce Lilly
Subject: Re: %option header (Was Re: flex 2.5.13 released)
Date: Fri, 16 Aug 2002 16:03:07 -0400
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020529

John Millaway wrote:
Thanks Bruce. Before I address each issue, let me explain the reasoning behind
the header generation. The header is meant to export only the EXTERNAL API, not
the internal guts. Flex generates many undocumented macros, but that doesn't
mean we're commiting to their existence in future versions. We may rename them
or remove them altogether as the skeleton evolves. The fact that the internal
macros appear in the header file is an undesirable side effect of the header
generation process. In order to undo that side effect, we squash those internal
macros by  #undefing them at the end of the header. This is to ensure that
users don't rely on those macros.

Protecting the namespace is equally as important as protecting the internals.
Flex should not export unprefixed symbols. The start conditions are not
prefixed in the scanner because the were never exported. If a user wants to
export start conditions in the header, then prefixes are a fact of life. As you
said, there is always `sed'.
[...]
2. start conditions are unconditionally #undef'ed.

The unprefixed ones are undef'd.

Those *are* the start condition names, exactly as the programmer
specified them in the .l file.  Sed doesn't help here because
while sed can be used to generate a separate header with the start
condition names as I described, the flex-generated header #undefs
the same start condition names.  Therefore if the flex-generated
header is included after the sed-generated one, the start condition
names are unusable.  One could include the sed-generated header
*after* the flex-generated one, but header inclusion order dependencies
are A Bad Idea.

[...]
5. There are munged versions of start condition names...
   a. I don't need munged versions; it would be nice to be able to
      avoid them to prevent namespace clashes
   b. I do need the real start condtion definitions (currently
      I extract them from the .c file via sed, but see #2 above).
      These could be conditionally included by an #ifndef wrapper...
   c. The munging is inconsistent w.r.t. case (FOOSC_INITIAL vs.
      foosc_Bar).


They are prefixed in order to prevent namespace clashes. I already have several
scanners that use start conditions such as "string", "comment", "dquote", etc.
They would immediately clash if those names weren't prefiexed. Changing the
case was to adjust to programmer's preference. I've seen flex scanners that use
all UPPERCASE START CONDITIONS, all lowercase start conditions, and First
Letter Capitalization. I tried to make the "munged" symbols follow the
programmer's preference. e.g.,

<DQQUOTE> becomes FOOSC_DQUOTE  (prefix uppercase)
<dquote> becomes foosc_dquote  (prefix lowercase)
<DQuote> becomes foosc_DQuote  (prefix unmodified)
<yydquote> becomes foosc_dquote  (prefix replaced)
<YYDQUOTE> becomes FOOSC_DQUOTE   (prefix replaced)

I'm open to other namespace-friendly suggestions. The "SC_" was added because
people use start conditions like "text", which would be prefixed to become
"yytext" -- clearly undesirable!

I don't quite understand why you say that the start conditions
which you list would clash; obviously they don't clash with
anything in the flex-generated .c file (and so wouldn't clash
with anything else in the header), they can't be C reserved
keywords, and the programmer had better make sure they don't
clash with variable or function names likely to be used in C
code for the application (as in the actions associated with
the patterns), system header macros and reserved parts of the
namespace -- what do you think they would clash with?

"text" is a poor choice for a start condition name ("text",
"data", "etext" are external location names in older versions
of C), but *N.B.* it can *only* clash with yytext *if it's
prefixed* -- it's safer *not* to munge the name with a prefix.
[Thank you for providing this example of why prefixing start
condition names is A Bad Idea...:-)]

N.B. in POSIX, macros beginning with "_SC_" are used for
sysconf() and are declared in unistd.h -- if a prefix of
"_" is specified, the name munging could easily cause
big problems...

There are other problems with the munged names:
1. That means that there are two names for the same start
   condition; the one given in the .l file and propagated
   to the .c file, and the munged one in the header.
2. Without access to the original start conditions, all code
   using the header would have to refer to the munged names.
   There may be a large number of such references. One
   problem with that is that the code becomes less legible
   precisely because there are two names for the same thing
   in different files (i.e. a given start condition has one
   name in the scanner .l/.c files, and a different name
   in the header and any file using that header)..
3. Another problem with the munged names and the references
   to those names is that the names will change if/when the
   prefix is changed.  That's a maintenance headache since
   many references may have to be manually edited.
4. portability suffers; munged start conditions are new --
   that means that the munged names constitute yet another
   version dependency -- lex vs. flex as well as 2.5.11 and
   earlier vs. later versions of flex. [As mentioned, I'm
   already using start condition names external to the scanner
   (with both lex and flex), and I suspect I'm
   not the only one doing so.]
5. Because of the additional names for the start conditions,
   namespace collisions are made more likely. (e.g. your
   yytext example [under the presumption that "text" would
   be acceptable in the scanner .c file])  Without munging,
   the programmer only has to be concerned with clashes between
   the start condition name and variables, functions, types,
   and macros; with the munging he must consider both the
   actual start condition name and the munged name.
6. Specifically in the case of the first-letter-capitalized
   start condition naming convention, the munging destroys
   the consitency of that convention -- Bar can be
   recognized as a start condition, but what the heck is
   fribblesc_Bar? -- the first letter isn't capitalized, so
   it's not immediately recognizable as a start condition name.

6. A number of useful macros are unconditionally #undefed...
   FLEX_SCANNER
   FLEX_MAJOR_VERSION
   FLEX_MINOR_VERSION
   YY_REENTRANT
   YY_REENTRANT_BISON_PURE
   yylex


I agree the FLEX_ macros would be useful, but the others are for internal use
only and should not be exported in a header.

The YY_REENTRANT* ones are useful as the calling convention
for the scanner differs depending on the options specified
for the scanner, and those macros reflect those options
and can therefore be used to ensure that the caller uses the
appropriate convention.  So a flex run-time option can be
specified to generate (or not) a reentrant scanner, and the
calling code will do the right thing when compiled with the
header generated by flex -- iff the YY_* macros are visible
and the programmer uses them to conditionally compile the
correct call.  Without visibility of the YY_REENTRANT*
macros, some manual fiddling with the code is likely to be
required when a flex option is changed.  That is likely to
be important as the new version of flex propagates, as it
provides a mechanism for developers to account for different
versions of flex which might be available when a package is
built (obviously 2.5.4 and earlier won't set YY_REENTRANT*
no matter what %option is specified).

And surely the yylex declaration (which may be redefined
for a prefix) is part of the API, not a purely internal
macro.  Without it, one ends up with things like:
warning: implicit declaration of function `yylex'
and
undefined reference to `yylex'

Likewise for any of the other yy* items which might be
subject to macro replacement or expansion for prefixes
or reentrant code.  That would include those which use YY_G,
if the programmer always uses the variable name "yyscanner"
for the pointer to the scanner struct (or defines yyscanner
appropriately).  In turn, that means that with some care, it
would be possible to write parser code and support fuctions
which will work for reentrant or traditional scanners and
which use yyin, yyout, yytext, etc.  Currently that would
require a bunch of macros similar to the ones in the
flex-generated header, but without the #undefs.

In order to avoid manual editing if/when a prefix is changed,
and to enable developers to write code which is portable w.r.t.
traditional vs. reentrant scanners, those macros ought to be
visible after the header is included, IMHO.

Instead of putting the unconditional #undefs at the end of the
header, most could be moved to the beginning.  Then if multiple
scanners are referenced from the same compilation module (I'm
not sure why anybody would want to do that...) there won't be any
redefinition warnings.  So a .c file might look like:

===============================================
/* Section A */

#include "foo.h"      /* generated by flex */

/* Section B */
/* ... */
int i = yylex(blurfl);  /* yylex may expand to foolex */
/* ... */

#include "bar.h"      /* generated by flex for another scanner */

/* Section C */
/* ... */
/* int j = yylex(grimble);      /* yylex may expand to barlex */
/* ... */
/* etc. */
===============================================

Section A can't reference any flex items since it appears
before the header is included.  But it can include system
headers, set macros such as YY_NO_GET_LVAL, define functions,
etc.  If yy* macros remain in effect as described above at
the end of the header, Section B can use yylex, yy_get_out,
etc. regardless of what prefix is used when the scanner is
generated (just as in the generated scanner .c file) to refer
to the scanner that was generated with the header. That means
no changes are required if the prefix is changed.  Section B
can also redefine macros such as YY_NO_GET_LVAL as appropriate
prior to the inclusion of bar.h. If the header #undefs yylex,
etc. at the start of the file, there will be no redefinition
warnings when bar.h is included. Code in Section C can then
refer to the scanner which was generated with bar.h via yylex
etc. rather than hard-coding prefixed names [and I see no
reason why that couldn't also apply to start condition names].

Best regards,
  Bruce Lilly





reply via email to

[Prev in Thread] Current Thread [Next in Thread]