lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev hyphenation


From: Vlad Harchev
Subject: Re: lynx-dev hyphenation
Date: Sat, 31 Jul 1999 17:02:40 +0500 (SAMST)

On Sat, 31 Jul 1999, Klaus Weide wrote:

> On Sat, 31 Jul 1999, Vlad Harchev wrote:
> >  Seems it will be impossible to use logic you propose seems given html
> > text and markup it's impossible (or very difficult) to decide whether the
> > given string is URL or not. But since most time URLs are the content of <a>
> > element, hyphenating URL won't be so harmful since target URL (href= value)
> > will survive. 
> 
> Why not some heuristics, like this:
> "words", for the purpose of hyphenation, consist of a sequence of
> letters.  Plus maybe some language-dependent special characters, but
> not general punctuation, numbers, etc.  Words that are part of a URL
> will normally be surrounded differently by non-word characters than
> those in normal text.  E.g. preceded directly by a separator
> character, while words in normal (English) text normally are not.
> 
> One way to do hyphenation in the lynx code would be to insert
> LY_SOFT_HYPHEN into the data stream at an earlier stage, and then let
> GridText.c functions deal with that as they already do.  This has
> several advantages.  One is reusing an already existing mechanism that
> is (somewhat) tested.  Another: if a break gets introduced within the
> <a href=...>anchorTextWhichMayBeALongURL</a>, the soft hyphen will
> actually be displayed in a different way when the anchor is made
> current (the '-' is not being highlighted).  At least it used to be
> that way.

 I plan to do similar to what you propose - add hyphenation logic to
split_line, find the word after the break position requested, hyphenate it
(adjusting corresponding structures if any) by inserting LY_SOFT_HYPHEN,
update requested split position if hyphenation helped, and then the control
will be passed to the old code that will use that hyphen.
 Such approach allows not to hyphenate entire text, but only last "word" on
each line.
 
> You should also parse and honor <NOBR>...</NOBR>.

 Can they be nested?

> Any use of a hyphenation algorithm shouls realy derive needed language
> information in the standard-prescribed way.  (Whatever that is exactly
> - Content-Language:, <LANG>, LANG= attributes).  I wouldn't like to
> see yet another feature (like your "justify" thing) that derives its
> parameters not from where it should take them, but only from some
> global default because it was too inconvenient to do the right thing.
> If you don't want to or cannot keep track of LANG specified by the
> HTML (which may be nested etc.), at least think about it and try to
> make it easier to do the right thing later.

 I don't know how much information from HTML document and HTTP headers that
patch will use, but seems this is situation similar to justification -
additional logic can be added later (remember, reasonable control for
justification can be provided if lynx style sheets support is implemented -
that will require a lof of time) - but seems for this patch implementing
perfect logic and control will require much less efforts than with
justification.

> I would like to see a third choice for text justification, that gives
> control over applying text justification to the HTML *at least* for
> those cases where ALIGN attributes are already being parsed, before
> any adding of hyphenation.

 I didn't understand what this paragraph meant, please explain (probably with
examples) - seems ALIGN attibutes are already parsed before any content of the
element gets rendered. And seems that justification will always be invoked
after hyphenation.

>    Klaus
> 

 Best regards,
  -Vlad


reply via email to

[Prev in Thread] Current Thread [Next in Thread]