lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Please help me!


From: Dick Wesseling
Subject: Re: lynx-dev Please help me!
Date: Tue, 24 Nov 1998 19:46:32 +0100

address@hidden said:
> ello, All developers of lynx:   i am  a chinese computer programer,
> now i read your source code of lynx and want learn something from
> it, but i meet many problem, one is there are too many states in the
> function SGML_character() (--in SGML.c file) , i can't get all their
> means, if you can give me more detail of these states, you will help
> me more


I don't think anybody has a list of details of SGML_character(), but
maybe it will help if you understand the general structure of the
program.

In Lynx - as in other applications using libWWW - data may come from
many different sources. A few of the possible sources are: a local
disk file, a HTTP server, a FTP server and a Gopher server.

Also, the incoming data may be in several formats, for example HTML
and plain text.

Finally data may goto to different destinations: it may be rendered to
the sreen, or stored on a file.

The number of possible combination of source, format and destination
is very high. Too high to deal with them in - say - a case statement.
Even worse, there may be more steps involved then the three mentioned
above, for instance data may be compressed or encrypted.

LibWWW solves this complexity by having modules for each of the
subtasks - fetching data, parsing data and rendering data. When Lynx
opens a connection it determines which steps are required to process
the data and it then constructs a chain of modules. LibWWW calls this
a "stream".

A stream is similar to a Unix pipeline. For example, assume we have a
.html file sitting on a FTP server which must be rendered to the
screen. If we would have a seperate program for each step then the
following Unix pipeline would do the job:

        ftp_prog | html_parser | screen_renderer


If each of the modules would indeed be a seperate program then it
could just use standard IO routines to communicate with the other
programs in the pipeline. E.g. "html_parser" could use getc() to read
data from "ftp_prog" and putc() to write its output to
"screen_renderer".

But libwww streams are not seperate processes. They are implemented as
objects that call one another. Now we have what is known as a
"producer consumer" problem; from html_parsers point of view it would
be easy if it just could call ftp_prog as a subroutine whenever it
needs to PULL a new character from the stream. On the other hand, from
ftp_prog's point it view it would be better if it could call
html_parser as a subroutine whenever it wants to PUSH a character into
the stream.

What we really need here are co-routines, not sub-routines. C does not
have co-routines which makes things very messy. Since we can not push
and pull at the same time libWWW favors the data producers: the
producer object - ftp in our example - can call the consumer - html -
but not the other way around.
Instead the html or SGML module but keep track of where it was in the
input stream. This is where all those state variables are for. The
reason there are so many of them is that there are a lot of different
SGML elements that must be recognized and most of those elements are
made up of more than one character.

So, if you want to understand what the state variables in SGML.c are
for you can best start by learning the syntax of SGML and HTML. Once
you understand what SMGL.c is trying to parse you may be able to
figure out the details.



> another problem is i can not understand how you deal with Unicode,
> it seems very complex and difficult, could you explain it for me.

I don't under it either... It's been a long time since I seriously 
looked into SMGL.c and all the Unicode handling has been added in 
recent years.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]