lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev tech. question: translating strings to different charsets


From: Vlad Harchev
Subject: Re: lynx-dev tech. question: translating strings to different charsets
Date: Tue, 7 Sep 1999 08:41:18 +0500 (SAMST)

On Sun, 5 Sep 1999, Vlad Harchev wrote:

> 
>  But info about lowercase/uppercase mapping is absent in the lynx.
>  Due to the syntax chosen, it will be somewhat difficult to handle d.c.s and
> dyrules utf8-encoded, so I won't add support for it right now (so the
> byte-to-byte mapping for "human letters" will be still mandatory, since
> chars that render into "(c)" are not "human letter"). The
> thing that will be left to do is to write uft8 character gathering (in case 
> of utf8
> d.c.s), converting it to lowercase and then to hyrules charset.
>  I don't have time to implement complete thing (hacking libnhj will be
> necessary, shipping unicode tables will be required ...)
>  Anyway, I'll try to help people to solve their problems with hyphenation.
> English-speaking-or-reading-only people won't have any problems. Though people
> that use documents with several (say) latin-1 encoded languages will be unable
> to use hyphenation at all (since hydict for only one of those languages can be
> loaded due to the fact that chsets are not disjoint), so they'll get incorrect
> hyphenation for words in other languages. To solve this problem, <span lang=x>
> must be used (it's hard to convince german writer to surround "debian" with
> <span lang=en></span>, thou' such words can be added to the hyphenation
> exceptions. My experience can tell that collisions will be unlikely, since
> hyphenation patterns are build by scanning a bunch of taive-language
> documents, so probably "debian" and other english words won't be hyphenated
> at all with german hyrules).

 I can add that using unicode won't solve the problem (of hyrules collision
when trying to hyphenate multilingual documents, say document with german and
english) since unicode doesn't have separate codes for 'german letter f' and
'english letter f' - just "latin capital letter F". So, there is no help from
using unicoded documents/d.c.s. for improving hyphenation quality/avoiding
collisions. Only using <span lang=de>Debian</span> will help here, but such
constructs are not used in the present-day web.

 Best regards,
  -Vlad


reply via email to

[Prev in Thread] Current Thread [Next in Thread]