Re: lynx-dev hyphenation

lynx-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev hyphenation

From:	Leonid Pauzner
Subject:	Re: lynx-dev hyphenation
Date:	Thu, 29 Jul 1999 21:58:07 +0400 (MSD)

29-Jul-99 23:13 Vlad Harchev wrote:
> On Thu, 29 Jul 1999, Klaus Peter Wegge wrote:

>> > 1) how to get information about the language of the current html file 
>> > (based
>> > on the charset name of the current document or user setups).

No language <--> charset mapping possible:
ISO-8859-1 covers a dozen of Western Europeal languages,
ISO-8859-2 covers several languages,
windows-1251 covers ALL cyrillic-based languages
while ISO-88859-5 covers Russian only, etc.

There is a `Content-Language=' HTTP/1.0/1.1 tag which could be set by the
server. (I assume AltaVista guess the document's language from this parameter)

[Just for completeness: the document may contain a text of different languages
say, English and French etc. In theory, there is a language attribute in
HTML/4.0 which could be set for each individual section but I have never seen
such tags in the real world.]

Another problem with an implementation of hyphinations may be charsets:
(1) document charset,
(2) display charset, and
(3) charset of the hyphination rules.
What to do when (2) != (3) and essentionally when ((2)!=(3) && (1)!=(3)) ?

>> Most specs in german site are wrong. I tried to use this mechanism
>> for choosing the right speech synthesizer for reading the site to a
>> multitasking user. I think the wrong specs come with the common usage
>> of generators for html-files, which are not configured very well.
>> I think, it's the same for other languages.
>> A collegue of mine played arround with a small word statistic tool:
>> very fast, heuristic and good detection for a lot of language.
>> As I remember implementation was done in about 500 lines pascal.
>> If you are interested I'll give you more details.

>  Please provide the details about word statistic tool (how big dictionary
> files does it need, is there an URL for this tool, is it OpenSource, does it
> handle multiply charsets for a given language...).
>  And seems that we need a mapping from charset name to language name (if
> mapping in strict sense is possible, ie the given charset name is used for
> encoding only one language) - otherwise the user will have to select right
> language for current document manually.

>> Klaus
>>

>  Best regards,
>   -Vlad

[Prev in Thread]

Current Thread

[Next in Thread]

lynx-dev hyphenation, Vlad Harchev, 1999/07/28
- Re: lynx-dev hyphenation, David Combs, 1999/07/28
- Re: lynx-dev hyphenation, Klaus Peter Wegge, 1999/07/29
  - Re: lynx-dev hyphenation, Vlad Harchev, 1999/07/29
    - Re: lynx-dev hyphenation, Leonid Pauzner <=
    - Re: lynx-dev hyphenation, Vlad Harchev, 1999/07/30
    - Message not available
    - Re: lynx-dev hyphenation, Lloyd G. Rasmussen, 1999/07/30
    - Re: lynx-dev hyphenation, Vlad Harchev, 1999/07/30
    - Re: lynx-dev hyphenation, Heather, 1999/07/30
    - Re: lynx-dev hyphenation, Vlad Harchev, 1999/07/31
    - Re: lynx-dev hyphenation, Klaus Weide, 1999/07/31
    - Re: lynx-dev hyphenation, Vlad Harchev, 1999/07/31
    - Re: lynx-dev hyphenation, Klaus Weide, 1999/07/31
    - Re: lynx-dev hyphenation, Vlad Harchev, 1999/07/31
    - Re: lynx-dev hyphenation, Klaus Weide, 1999/07/31

Prev by Date: Re: lynx-dev patch that allows text inputs to be non-sticky
Next by Date: Re: Lynx test cases [Was: Re: lynx-dev non-sticky text inputs
Previous by thread: Re: lynx-dev hyphenation
Next by thread: Re: lynx-dev hyphenation
Index(es):
- Date
- Thread