lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev hyphenation


From: Vlad Harchev
Subject: Re: lynx-dev hyphenation
Date: Fri, 30 Jul 1999 21:59:34 +0500 (SAMST)

On Thu, 29 Jul 1999, Leonid Pauzner wrote:

> 29-Jul-99 23:13 Vlad Harchev wrote:
> > On Thu, 29 Jul 1999, Klaus Peter Wegge wrote:
> 
> >> > 1) how to get information about the language of the current html file 
> >> > (based
> >> > on the charset name of the current document or user setups).
> 
> No language <--> charset mapping possible:
> ISO-8859-1 covers a dozen of Western Europeal languages,
> ISO-8859-2 covers several languages,
> windows-1251 covers ALL cyrillic-based languages
> while ISO-88859-5 covers Russian only, etc.

 That is what I afraid (wrt ISO-8859-1).
 
> There is a `Content-Language=' HTTP/1.0/1.1 tag which could be set by the
> server. (I assume AltaVista guess the document's language from this parameter)
> 
> [Just for completeness: the document may contain a text of different languages
> say, English and French etc. In theory, there is a language attribute in
> HTML/4.0 which could be set for each individual section but I have never seen
> such tags in the real world.]
> 
> Another problem with an implementation of hyphinations may be charsets:
> (1) document charset,
> (2) display charset, and
> (3) charset of the hyphination rules.
> What to do when (2) != (3) and essentionally when ((2)!=(3) && (1)!=(3)) ?

 I don't think  this is a problem (due to good charset translation support
present in lynx already).
 Seems that for first time we can allow only one language per document
(ignoring rarely used <span> element and language= attribute of other
elements). But we can create gybrid languages (like English-Russian) if
codes of characters corresponding to letters in both languages don't
intersect - TeX hyphenation tables for such hybrid languages can be produced
by simply concatenating hyphenation rules' files together.
 Probably the to-be-introduced setting "Assumed Document Language" will be the
main way of controlling language of the current document.

>[...] 

 Best regards,
  -Vlad


reply via email to

[Prev in Thread] Current Thread [Next in Thread]