[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lynx-dev hyphenation
From: |
Leonid Pauzner |
Subject: |
Re: lynx-dev hyphenation |
Date: |
Thu, 29 Jul 1999 21:58:07 +0400 (MSD) |
29-Jul-99 23:13 Vlad Harchev wrote:
> On Thu, 29 Jul 1999, Klaus Peter Wegge wrote:
>> > 1) how to get information about the language of the current html file
>> > (based
>> > on the charset name of the current document or user setups).
No language <--> charset mapping possible:
ISO-8859-1 covers a dozen of Western Europeal languages,
ISO-8859-2 covers several languages,
windows-1251 covers ALL cyrillic-based languages
while ISO-88859-5 covers Russian only, etc.
There is a `Content-Language=' HTTP/1.0/1.1 tag which could be set by the
server. (I assume AltaVista guess the document's language from this parameter)
[Just for completeness: the document may contain a text of different languages
say, English and French etc. In theory, there is a language attribute in
HTML/4.0 which could be set for each individual section but I have never seen
such tags in the real world.]
Another problem with an implementation of hyphinations may be charsets:
(1) document charset,
(2) display charset, and
(3) charset of the hyphination rules.
What to do when (2) != (3) and essentionally when ((2)!=(3) && (1)!=(3)) ?
>> Most specs in german site are wrong. I tried to use this mechanism
>> for choosing the right speech synthesizer for reading the site to a
>> multitasking user. I think the wrong specs come with the common usage
>> of generators for html-files, which are not configured very well.
>> I think, it's the same for other languages.
>> A collegue of mine played arround with a small word statistic tool:
>> very fast, heuristic and good detection for a lot of language.
>> As I remember implementation was done in about 500 lines pascal.
>> If you are interested I'll give you more details.
> Please provide the details about word statistic tool (how big dictionary
> files does it need, is there an URL for this tool, is it OpenSource, does it
> handle multiply charsets for a given language...).
> And seems that we need a mapping from charset name to language name (if
> mapping in strict sense is possible, ie the given charset name is used for
> encoding only one language) - otherwise the user will have to select right
> language for current document manually.
>> Klaus
>>
> Best regards,
> -Vlad
- lynx-dev hyphenation, Vlad Harchev, 1999/07/28
- Re: lynx-dev hyphenation, David Combs, 1999/07/28
- Re: lynx-dev hyphenation, Klaus Peter Wegge, 1999/07/29
- Re: lynx-dev hyphenation, Vlad Harchev, 1999/07/29
- Re: lynx-dev hyphenation,
Leonid Pauzner <=
- Re: lynx-dev hyphenation, Vlad Harchev, 1999/07/30
- Message not available
- Re: lynx-dev hyphenation, Lloyd G. Rasmussen, 1999/07/30
- Re: lynx-dev hyphenation, Vlad Harchev, 1999/07/30
- Re: lynx-dev hyphenation, Heather, 1999/07/30
- Re: lynx-dev hyphenation, Vlad Harchev, 1999/07/31
- Re: lynx-dev hyphenation, Klaus Weide, 1999/07/31
- Re: lynx-dev hyphenation, Vlad Harchev, 1999/07/31
- Re: lynx-dev hyphenation, Klaus Weide, 1999/07/31
- Re: lynx-dev hyphenation, Vlad Harchev, 1999/07/31
- Re: lynx-dev hyphenation, Klaus Weide, 1999/07/31