lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev hyphenation


From: Klaus Weide
Subject: Re: lynx-dev hyphenation
Date: Sat, 31 Jul 1999 04:20:00 -0500 (CDT)

On Sat, 31 Jul 1999, Vlad Harchev wrote:
>  Seems it will be impossible to use logic you propose seems given html
> text and markup it's impossible (or very difficult) to decide whether the
> given string is URL or not. But since most time URLs are the content of <a>
> element, hyphenating URL won't be so harmful since target URL (href= value)
> will survive. 

Why not some heuristics, like this:
"words", for the purpose of hyphenation, consist of a sequence of
letters.  Plus maybe some language-dependent special characters, but
not general punctuation, numbers, etc.  Words that are part of a URL
will normally be surrounded differently by non-word characters than
those in normal text.  E.g. preceded directly by a separator
character, while words in normal (English) text normally are not.

One way to do hyphenation in the lynx code would be to insert
LY_SOFT_HYPHEN into the data stream at an earlier stage, and then let
GridText.c functions deal with that as they already do.  This has
several advantages.  One is reusing an already existing mechanism that
is (somewhat) tested.  Another: if a break gets introduced within the
<a href=...>anchorTextWhichMayBeALongURL</a>, the soft hyphen will
actually be displayed in a different way when the anchor is made
current (the '-' is not being highlighted).  At least it used to be
that way.

You should also parse and honor <NOBR>...</NOBR>.

Any use of a hyphenation algorithm shouls realy derive needed language
information in the standard-prescribed way.  (Whatever that is exactly
- Content-Language:, <LANG>, LANG= attributes).  I wouldn't like to
see yet another feature (like your "justify" thing) that derives its
parameters not from where it should take them, but only from some
global default because it was too inconvenient to do the right thing.
If you don't want to or cannot keep track of LANG specified by the
HTML (which may be nested etc.), at least think about it and try to
make it easier to do the right thing later.

I would like to see a third choice for text justification, that gives
control over applying text justification to the HTML *at least* for
those cases where ALIGN attributes are already being parsed, before
any adding of hyphenation.

   Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]