[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lynx-dev Special Hyphen
From: |
Al Gilman |
Subject: |
Re: lynx-dev Special Hyphen |
Date: |
Thu, 20 Aug 1998 18:37:23 -0400 (EDT) |
to follow up on what Michael Warner said:
> On Tue, Aug 18, 1998, Al Gilman <address@hidden> wrote:
>
> > to follow up on what Jason F. McBrayer said:
> >
> > > I don't think it's supposed to line up. "­" is a soft
> > > hyphen; it's only supposed to be a hint as to whether a word
> > > can be hyphenated at a particular point. If a program needs
> > > to hyphenate a word, it should do it at the "­" and print
> > > a regular hyphen there.
> >
> > Let's see if I understand you. It sounds as though, if Lynx is
> > not going to try to break words at syllable boundaries, that it
> > can safely render "­" as "" without regard for the context?
>
> I'm not sure I understand your question, but I won't let that
> stop me...
>
> My understanding is that ­ functions as a
> pseudo-word-boundary, for purposes of breaking lines too long for
> the screen width, thus avoiding a short line before a long word
> (as in the first two lines of this paragraph). So as far as lynx
> is concerned, breaking at a ­ wouldn't be breaking the word
> at a syllable boundary, but at a word boundary.
>
> So, if you're asking if, since lynx doesn't break lines at
> syllable boundaries, it can ignore ­, I would say no.
>
> Am I close?
Yes, you are close. But no cigar. I believe, as you said, that
­ exists to support smart word-wrapping. The trick is that
we have to get more technical than "word" to deal with hypenation
and smart word-wrapping.
To a simple minded lexical analyzer, "pseudo-word-boundary" is a
single token; it lacks any interior whitespace. This is what I
was casually calling a "word" that might get segmented if the
word-wrapping algorith has the option to apply hyphenation to
even up the length of lines when the too-many-eth word of a line
is a long one.
The fly in the ointment for your example is that if we put
pseudo-­word-­boundary in the HTML it would get mapped
to
> My understanding is that ­ functions as a pseudo-word--
> boundary, for purposes of breaking lines too long for the screen
> ...
with one hyphen too many but if we put pseudo­word­boundary
in the HTML it would format as
> My understanding is that ­ functions as a pseudoword-
> boundary, for purposes of breaking lines too long for the screen
> ...
with one hyphen too few.
Or we have to start looking at the previous character when
processing ­ and collapsing -- (but you don't want to go there...).
For a better example, take anti­disestablishment­arianism
where the customary spelling has done away with hyphens but you
want to steer the hyphenation away from some awkward
syllable-boundaries.
I was not trying to keep someone from adding smart hyphenation
[breaking at ­ if need be] but rather trying to say that
in the absense of this function, all ­ occurrences could
get mapped to null strings.
Let me state the algorithm more generally:
First: Do whatever smart word-breaking you are going to do
in the process of breaking a text into lines for display/printing.
If you break a word at an interior point where the author has
placed an ­ SGML entity, then apply a hyphen as the last
character of the first line and make the first character after
the ­ entity the first character of the next line [modulo
margining].
Then: For all remaining ­ entities not consumed by the above
processing, replace each one with a null string.
Then I can restate what I said earlier as:
Until somebody implements the first part of this algorithm,
all ­ entities are going to be caught by the second step.
- Al
Re: lynx-dev Special Hyphen, Michael Warner, 1998/08/19