lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev hyphenation


From: Klaus Weide
Subject: Re: lynx-dev hyphenation
Date: Sat, 31 Jul 1999 09:25:27 -0500 (CDT)

On Sat, 31 Jul 1999, Vlad Harchev wrote:
>  I plan to do similar to what you propose - add hyphenation logic to
> split_line, find the word after the break position requested, hyphenate it
> (adjusting corresponding structures if any) by inserting LY_SOFT_HYPHEN,
> update requested split position if hyphenation helped, and then the control
> will be passed to the old code that will use that hyphen.
>  Such approach allows not to hyphenate entire text, but only last "word" on
> each line.

I would have thought (1) this should be controlled in HTML.c
(i.e. that's where LY_SOFT_HYPHENs get inserted, and (2) this would
require some buffering for lookahead (at least one word).

Maybe you can save trying hyphenation of most words.  But the simple
way you outline has problems.

You cannot just do it in split_line without changes in earlier
processing.  When split_line gets called to insert a break in a line,
the line typically looks like this:

   xxxxxxx xxxxxx xxxxxxxxxxx xxxxxxxxx xxxxxxxxxxx xxxxxx xxxxxx??????
                                                          ^      ^
                                                          |      |
                                                          S      M

The first mark (S) is text->permissible_split (passed as 'split')
pointing to a space.  The second mark (M) is the current maximal line
length, which would be exceeded if the line was not split now.  The
'??????' is the remainder of the last word, which we haven't seen yet.
HText_appendCharacter is processing the first of those characters
right now, but hasn't yet appended them to the line.  The rest are
probably still stuck in HTML.c or SGML.c.

Currently split_line splits the line at S.  You want to make it split
(possibly) later, somewhere between S and M, but you can't determine
the possible hyphenation points because the word is incomplete.

One way out of this would be to allow HText_appendCharacter some
"overdraft", i.e. let the line grow longer than it ever should be (up
to the next space or word delimiter, probably), but - apart from the
question where to stop - this will violate an invariant that has so
far always been true (except for SOURCE mode before LY_SOFT_NEWLINE
splitting was introduced), and may be assumed in various places, so
it would likely create problems (possibly very obscure ones).

> On Sat, 31 Jul 1999, Klaus Weide wrote:
> > You should also parse and honor <NOBR>...</NOBR>.
> 
>  Can they be nested?

Well, I couldn't find NOBR in an official HTML spec, so I don't know.
Since it doesn't officially exist, you might ignore it, although I
seem to see it a lot.  Handling it anyway may help to reduce the
problem you were already concerned with, hyphenating in inappropriate
strings.

Keeping a counter (me->Nobr_Level) wouldn't be much more effort than a
simple flag (me->inNOBR), should it be necessary to keep track of
nesting.

> 
>  I don't know how much information from HTML document and HTTP headers that
> patch will use, but seems this is situation similar to justification -
> additional logic can be added later (remember, reasonable control for
> justification can be provided if lynx style sheets support is implemented -

See below - we don't need to wait for extended style sheet support do
do (3).  Part of what you call additional logic - what I call more correct
logic - should be implementable now.

> that will require a lof of time) - but seems for this patch implementing
> perfect logic and control will require much less efforts than with
> justification.

It should at least be written in a way that it can deal with changes
of the current language (which may come from HTML parsing).

> > I would like to see a third choice for text justification, that gives
> > control over applying text justification to the HTML *at least* for
> > those cases where ALIGN attributes are already being parsed, before
> > any adding of hyphenation.
> 
>  I didn't understand what this paragraph meant, please explain (probably with
> examples) - seems ALIGN attibutes are already parsed before any content of the
> element gets rendered. And seems that justification will always be invoked
> after hyphenation.

Sorry, I tried to put too much in one sentence.

Let me rephrase: I would like to see patches to make "text
justification" more correct, before patches for "hyphenation" start
getting applied (if at all).

The patches I would like to see for justification should handle
ALIGN="justify" at least for those elements where HTML_start_element
currently already reacts to an ALIGN attribute.  Those are not many
elements.  A third mode should be provided, in addition to what is
currently the case:
  (1) treat all ALIGN="left"[*] and ALIGN="justify" as ALIGN="left" 
      (traditional)
  (2) treat all ALIGN="left"[*] and ALIGN="justify" as ALIGN="justify"
      except for some hardcoded tags (your mode, as I understand it)
  (3) treat ALIGN="left"[*] as ALIGN="left", and treat ALIGN="justify"
      as ALIGN="justify" (at least for those tags where HTML.c already
      interprets ALIGN; new)

[*] the ALIGN="left" may not be explicit but just the unstated default.

Or maybe (per user preference)
  (3a) treat explicit ALIGN="left" as ALIGN="left", and treat
       ALIGN="justify" and missing ALIGN as ALIGN="justify".


   Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]