mit-scheme-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[MIT-Scheme-devel] A design question


From: Joe Marshall
Subject: [MIT-Scheme-devel] A design question
Date: Tue, 23 Mar 2010 08:11:56 -0700

I'm writing the code to parse the ``file attributes line'' in a Scheme
source file.  (That's the line with the -*- Mode: Scheme -*-)
The format of that line seems fairly loose.  I've seen a lot of
variants.  The basic format is a series of key, value pairs which
are delimited by semi-colons.  The key is separated from the value
by a colon.

On the Lisp Machine, the file attributes line could span multiple
lines, but in GNU Emacs, it must not.  It seems that the value of
an attribute should be an object readable by GNU Emacs lisp.  It
is usually a symbol or number, but sometimes a quoted string.
On the Lisp Machine it could be a list.  The key appears to be
a symbol.

Here's my thinking:  Since the key is a symbol, and the value can
be an object, it makes sense to re-use the parser rather than write
a new parser just for the file attributes.  (A new parser would look
very similar to the old anyway.)  However, there are a couple of
things that are dramatically different about reading the file attributes
line.  The first is that semicolons do not indicate the start of a comment.
This is easily dealt with by making a special parser table for the
file-attribute line and arranging for semicolons to behave differently.

The second problem is thornier.  Some people don't like to put spaces
in the file-attribute line.  When you attempt to use parse-atom to read
a key from the file-attribute line, it can gobble up too much.  For example,
when we try to read the key from this line: "#|| -*-mode:scheme-*-", we
get back the symbol 'mode:scheme-*-
The problem is that `colon' is not recognized as a symbol delimiter.

It's actually fairly easy to make the set of symbol delimiters be dynamic,
but here's where my design question comes in.  Should the relevant char-sets
be attached to the parser table, or to the `db' structure?  That depends on
the abstraction that the parser table is supposed to provide.

    Option 1:  The parser table not only knows the handlers for the initial
        characters in tokens, but also knows the delimiter and constituent
        sets so that it encapsulates full knowledge of how to tokenize.  In this
        case, the parser table should contain the delimiters and constituents.

    Option 2:  The parser table *only* knows about the handlers for the
        initial characters and encapsulates the knowledge needed for
        dispatch only.  Each handler knows how much of the input stream
        it should consume.  In this case, the `db' should contain the delimiters
        and constituents (and the parser table, of course).

So which option follows the intent of the existing code?

-- 
~jrm




reply via email to

[Prev in Thread] Current Thread [Next in Thread]