lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev hyphenation (was tech. question: translating strings)


From: Klaus Weide
Subject: lynx-dev hyphenation (was tech. question: translating strings)
Date: Mon, 6 Sep 1999 06:49:45 -0500 (CDT)

[ reply split into several parts ]

On Sun, 5 Sep 1999, Vlad Harchev wrote:
> On Fri, 3 Sep 1999, Klaus Weide wrote:
> 
>  OK, as I reported, hyphenation already works. I had to slightly change
> approach - now each word is hyphenated and LY_SOFT_HYPHEN is inserted at the
> rightmost possibly hyphen position of each word, updating
> text->permissible_split (since last word on the line can be unhyphenatable).
> 
>  Here is more basic information on hyphenation rules:
> they are patterns that specify possible hyphen positions in the part of that
> pattern. Such patterns are build by running special programs over native text
> (I don't know the algorithm exactly). The order of patterns in the dictionary
> is not signtificant. Hyphenation exceptions are expressed in terms of patterns
> too (at least in libhnj) - (using plain hydict, linux is hyphenated as lin-ux
> - is this correct?). Pattern matching is implemented as finite-state machine
> in libhnj (the transitions are calculated when reading hydict). Apparently, if
> two languages use different keycodes, it's possible to concatenate hydicts to
                              ^^^^^^^^
> get the hyrules that will hyphenate two languages at the same time - 

Why are you talking about keycodes?  That doesn't seem to make much sense,
what kind of "keycodes" do you mean?  X11 keycodes?  Is the program bound
to *that*?  (I hope not.) 

That sentence would make more sense to me if you replaced "keycode" with
"character", with the understanding that "character" is meant in the
ISO10646/Unicode sense.

>                                                                       so I
> afraid, english phrases like StarDivision will be hyphenated incorrectly if
> hydict for French is loaded since AFAIK French and English use latin-1
> encoding (at least the keycodes of both lanugages are not disjoint).

Well of course phrases in one language wil be hyphenatted incorrectly if
patterns for a different language are used.  That should be no surprise!
(The only way around wrong hyphenation of proper names etc. from another
language that I can think of would be to have specific exceptions, in your
case in the French patterns.)

The fact that the dictionary allows you to combine patterns for two (or
more) languages IF AND ONLY IF their letter repertoires are completely
non-overlapping should be viewed as a hack that can be useful in some
situations, nothing more.  You can use one set of (combined) patterns for
Russian+English, but it won't work for Russian+German or Russian+Ukrainian
(assuming the two languages need different patterns), and you cannot use
the same combined set for Ukrainian+English (under the same assumption) or
for German+English or for Russian+German.

In fact, you cannot express that last one at all, not to mention
combinations like Russian+Greek, unless you go to transform and apply the
patterns in some representation of UCS (since no 8-bit charset I know has
all the necessary letters combined).  That combining effect is only useful
for X+English (and one or two other languages of all human languages in
place of English), since only for English (and those other one or two) is
the necessary character repertoire completely part of the common 8-bit
charsets.  (Well not even that; if the hyphenation patterns remain bound
to a specific charset, you cannot put patterns for the correct spelling
of such useful words as naïve or blasé or brassière in a combined Russian+
English dictionary.)

[ to be continued ]

   Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]