|
From: | oleg smolsky |
Subject: | Flex and bison howto for C++ users |
Date: | Tue, 18 Feb 2003 09:57:13 +1300 |
Hello all, a while ago after googling and bugging people on this list I figured out how to get bison and a C++ compiler talk to each other. It turns out that there is no documentation that covers this process, and it is asked repeatedly on this list. So, I have written a howto on this. Please find attached howto.html that describes the following: -- creating a small grammar -- creating a simple scanner -- declaring a common class with attributes that are c++ objects -- getting bison to use the new lalr c++ skeleton -- getting flex to use objects allocated by bison -- compiling all this stuff with cygwin under windows All healthy constructive criticism is welcome :) Best regards, Oleg Smolsky Software Design Authority Allied Telesyn Research NOTICE: This message contains privileged and confidential information intended only for the use of the addressee named above. If you are not the intended recipient of this message you are hereby notified that you must not disseminate, copy or take any action in reliance on it. If you have received this message in error please notify Allied Telesyn Research Ltd immediately. Any views expressed in this message are those of the individual sender, except where the sender has the authority to issue and specifically states them to be the views of Allied Telesyn Research.
This howto accumulates all tweaks, fixes and hacks required to implement a bison/flex command parser using C++. It assumes some basic grammar, parsing and regular _expression_ knowledge.
Note, the approach described here concentrates on building a parser winthin
windows environmet, however all ideas and tools are still applicable to unix/g++.
One just need to remove a few lines of code, such as #include <windows.h>
:)
Written by Oleg Smolsky <address@hidden>, February 2003
load <filename>
and display
<address>
. So, lets imlement a system in C++, so that all
scanning and parsing is done by using flex and bison respectevely.
PATH
environment
variable contains cygwin's bin directpory. E.g. c:\something\cygwin\bin.
This way it is possible to call bash, flex and bison directly from a command
prompts.
bash -c "bison -d -S lalr1.cc -o parser.cpp parser.y"
bash
with a command from your makefile or
VC++ project. Once bash is operational, it can execute the given unix command that
within the cygwin environment. This is important for tools such as
bison, because it needs to access it's skeleton via a unix path:
\usr\share\bison\lalr1.cc
bash -c "flex -oscanner.cpp scanner.l"
Lets consider the content of scanner.l
The first section specifies
a block of code to go to almost the very top of the .cpp
file. It
includes some standard c++ headers and declears a few macros. The following
section defines a couple of terminals: T_DISPLAY
and T_LOAD
as well as a few non-terminals: NT_DECNUMBER
, NT_HEXNUMBER
and NT_STRING
.
The tricky part here is the fact that yylex()
will be called with a pointer
to an instance of a user defined class decleared in parser.y
. This way we
can store data in multiple formats within this object.
%{ #include <iostream> #include <string> #include "parser.hpp" extern void yyerror(const char* s); #define YY_DECL int yylex(yystype *p) #define ASSERT(condition) if (!(condition)) _asm int 3; %} %% "display" { return T_DISPLAY; } "load" { return T_LOAD; } "\r\n" { return T_CRLF; } [0-9]+ { p->m_sVersion = yytext; char *pcLast = NULL; p->m_dwVersion = strtoul(yytext, &pcLast, 16); ASSERT(pcLast != NULL && *pcLast == 0); return NT_DECNUMBER; } [0-9a-fA-F]+ { p->m_sVersion = yytext; char *pcLast = NULL; p->m_dwVersion = strtoul(yytext, &pcLast, 16); ASSERT(pcLast != NULL && *pcLast == 0); return NT_HEXNUMBER; } address@hidden&*()_+[\]{}?/.>,<'";:\\|]+ { p->m_sVersion = yytext; return NT_STRING; } %%
The first section of the file includes appropriate C++ headers and defines the main fundamental class, that will be used for scanning and parsing. A pointer to an instance of this class will be passed into the scanning routine, so the members can be filled in appropriately to the token type.
The second section defines the grammer required to parse our sophisticated commands and handlers that call appropriate engine routines.
%{ #include <cstdlib> #include <string> #include <vector> #include <deque> #include <stdarg.h> #define WIN32_LEAN_AND_MEAN #include <windows.h> typedef unsigned long dword; typedef unsigned short word; typedef unsigned char byte; #include "EmEngine.h" class part { public: std::string m_sVersion; dword m_dwVersion; int last_line, last_column; }; #define YYSTYPE part typedef part yystype; typedef part yyltype; typedef char yysigned_char; int yylex(yystype *p); %} %token T_DISPLAY T_LOAD T_CRLF NT_DECNUMBER NT_HEXNUMBER NT_STRING %start command_list %% command_list: /* empty */ | command_list display_command | command_list load_command | command_list error { // explicit handler is not required -- Parser::error_() is called automatically }; d_command: T_DISPLAY address T_CRLF { g_engine.Display($2.m_dwVersion); } ; load_command: T_LOAD filename T_CRLF { g_engine.Load($2.m_sVersion); } ; address: NT_HEXNUMBER | NT_DECNUMBER; filename: NT_DECNUMBER | NT_HEXNUMBER | NT_STRING; %%
Once you have managed to setup the builds, above commands would produce the
following files: parser.cpp parser.hpp location.hh stack.hh
and scanner.cpp
These files need to be compiled as part of your project. If you are
using a VC++ project or makefile, make sure that the compiler option called
"precompiled headers" is switched off for these files.
Here is a block of code that declares a paser instance, and implements
a simple error handler. error_()
is called when a given string
is not part of the specified language.
#include "precompiledpp.h" #include "EmEngine.h" #include "EmMonitor.h" #include "parser.hpp" yy::Parser parser(true); namespace yy { void Parser::error_() { g_monitor.AddToOutput("Unrecognised command."); } void Parser::print_() { } } int isatty(int i) { return 0; }
Now, this block defines a funtion that feeds a new command to the scanner. Imageine, that this routine is executed immediately after the user has typed a command.
void ParseMessage(std::stirng sCommand) { static yy_buffer_state *pBuffer; pBuffer = yy_scan_bytes(sCommand.c_str(), sCommand.size()); pBuffer->yy_at_bol = 1; yy_switch_to_buffer(pBuffer); parser.parse(); yy_delete_buffer(pBuffer); }
There is another block of code that one would need in order to compile the scanner.
The hack described above depends on the structure called yy_buffer_state
that is copied from the flex generated code. Note, it might vary from version to version.
Put this block of code into scanner.h and #include
it:
typedef unsigned int yy_size_t; struct yy_buffer_state { FILE *yy_input_file; char *yy_ch_buf; /* input buffer */ char *yy_buf_pos; /* current position in input buffer */ yy_size_t yy_buf_size; int yy_n_chars; int yy_is_our_buffer; int yy_is_interactive; int yy_at_bol; int yy_fill_buffer; int yy_buffer_status; }; yy_buffer_state *yy_scan_bytes(const char *bytes, int len); void yy_switch_to_buffer(yy_buffer_state *new_buffer); void yy_delete_buffer(yy_buffer_state *buffer);
[Prev in Thread] | Current Thread | [Next in Thread] |