lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Special Hyphen


From: Al Gilman
Subject: Re: lynx-dev Special Hyphen
Date: Thu, 20 Aug 1998 18:37:23 -0400 (EDT)

to follow up on what Michael Warner said:

> On Tue, Aug 18, 1998, Al Gilman <address@hidden> wrote:
> 
> > to follow up on what Jason F. McBrayer said:
> >
> > > I don't think it's supposed to line up. "&shy;" is a soft
> > > hyphen; it's only supposed to be a hint as to whether a word
> > > can be hyphenated at a particular point.  If a program needs
> > > to hyphenate a word, it should do it at the "&shy;" and print
> > > a regular hyphen there.
> >
> > Let's see if I understand you.  It sounds as though, if Lynx is
> > not going to try to break words at syllable boundaries, that it
> > can safely render "&shy;" as "" without regard for the context?
> 
> I'm not sure I understand your question, but I won't let that
> stop me...
> 
> My understanding is that &shy; functions as a
> pseudo-word-boundary, for purposes of breaking lines too long for
> the screen width, thus avoiding a short line before a long word
> (as in the first two lines of this paragraph).  So as far as lynx
> is concerned, breaking at a &shy; wouldn't be breaking the word
> at a syllable boundary, but at a word boundary.
> 
> So, if you're asking if, since lynx doesn't break lines at
> syllable boundaries, it can ignore &shy;, I would say no.
> 
> Am I close?

Yes, you are close.  But no cigar.  I believe, as you said, that
&shy; exists to support smart word-wrapping.  The trick is that
we have to get more technical than "word" to deal with hypenation
and smart word-wrapping.

To a simple minded lexical analyzer, "pseudo-word-boundary" is a
single token; it lacks any interior whitespace.  This is what I
was casually calling a "word" that might get segmented if the
word-wrapping algorith has the option to apply hyphenation to
even up the length of lines when the too-many-eth word of a line
is a long one.

The fly in the ointment for your example is that if we put
pseudo-&shy;word-&shy;boundary in the HTML it would get mapped
to 

> My understanding is that &shy; functions as a pseudo-word--
> boundary, for purposes of breaking lines too long for the screen
> ...

with one hyphen too many but if we put pseudo&shy;word&shy;boundary
in the HTML it would format as

> My understanding is that &shy; functions as a pseudoword-
> boundary, for purposes of breaking lines too long for the screen
> ...

with one hyphen too few.

Or we have to start looking at the previous character when 
processing &shy and collapsing -- (but you don't want to go there...).

For a better example, take anti&shy;disestablishment&shy;arianism
where the customary spelling has done away with hyphens but you
want to steer the hyphenation away from some awkward
syllable-boundaries.

I was not trying to keep someone from adding smart hyphenation
[breaking at &shy; if need be] but rather trying to say that
in the absense of this function, all &shy; occurrences could 
get mapped to null strings.

Let me state the algorithm more generally:

First: Do whatever smart word-breaking you are going to do
in the process of breaking a text into lines for display/printing.
If you break a word at an interior point where the author has
placed an &shy; SGML entity, then apply a hyphen as the last
character of the first line and make the first character after
the &shy; entity the first character of the next line [modulo
margining].

Then: For all remaining &shy; entities not consumed by the above
processing, replace each one with a null string.

Then I can restate what I said earlier as:

Until somebody implements the first part of this algorithm,
all &shy; entities are going to be caught by the second step.

- Al

reply via email to

[Prev in Thread] Current Thread [Next in Thread]