lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev hyphenation (was tech. question: translating strings)


From: Klaus Weide
Subject: Re: lynx-dev hyphenation (was tech. question: translating strings)
Date: Tue, 7 Sep 1999 02:54:57 -0500 (CDT)

On Tue, 7 Sep 1999, Vlad Harchev wrote:
> On Mon, 6 Sep 1999, Klaus Weide wrote:
>
> > Yes, that's one possibility.  And "wide characters" could be either
> > 2 or 4 bytes.  And the format for passing text around (SGML.c ->
> > HTML.c ->GridText.c) doesn't have to be the same that's used for
> > "storing it" where memory usage is important (i.e. mostly HTLine).
> 
>  Then you'll spent month on redesigning lynx in this fashion (if you were
> serious).

And if it takes a month, that would be a month better spent than by
adding user features that no user but one wants.

> > There are lots of things that could become simpler if a Unicode
> > representation were used throughout.
> 
>  They could be done simpler (ie they are done). Why do you plan to spend
> precious time on unnecessary internal redesigns (be pragmatic not paranoid)
> that can be spent on more useful things?

If everyony had been thinking that way, lynx would have long collapsed
completely under the weight of arbitrary features.  I don't find the
idea satisfying that I am contributing to that (although in practice
I probably am).

Your idea of "useful" is obviously different from mine.  I find the
code more useful if I can understand better what it does.  Lots of
ad hoc features for a very limited purpose don't help there.


> > >  I'm glad that you understand that UTF-8 (and UCS*) doesn't  have anything
> > > with "mixing several languages that use the same repertoire in one 
> > > document"
> > > (I thought I thought that this was a solution).
> > 
> > Huh?  It was you who seemed to somehow seem a connection between "UTF8
> > in documents" (i.e. externally) and "mixing languages".  Now you seem
> > to change the topic to something else completely.
> 
>  May be it's my bad english. By I tried to inspire you that the use of unicode
> can't prevent from hyrules collision (or incorrect hyphenation) for document
> with mixed languages with non-disjoing repertoires.
> 
> > > The 'lang=' is for solving 
> > > this. Why do you push "unicode" everywhere?
> > 
> > It is already used in lynx for the character translations.  Whether you
> > know it or not, when you view a cp<something> Russian text with KOI8-R
> > you are using it.  Using it as a common lingua franca allows translation
> > between N charsets with O(N) instead of O(N**2) tables.  That alone
> > should be good enough reasons for using it internally.
> 
>  But conversion between 2 given chsets would take much more time if Unicode is
> used (and libhnj should be rewritten).

You still don't get it that it *already is being used* for exactly that, for
conversion between 2 given charsets.  Since we are already using it, we might
as well use it everywhere where it makes sense.

> > The point was that *there is no 8-bit charset* that has them both.
> 
>  This makes a difference. Your example is a good for illustrating why utf8
> d.c.s are needed. Thanks.

Rather than that it was a good example why a Unicode representation is good
for all kings of processing, not just display.

>  Well, lynx without hyphenation doesn't look too bad :)
>  But seems russian is one of the very few languages that doesn't use latin
> letter - hebrew, arabic, greek, turkish and ukrainian are others. So, such
> problem is very rare. 

Wrong about Turkish (at least in Turkey).  Wrong about "Very few languages".
And probably wrong about the conclusion (although I last track of what the
"such problem" is you are talking about now).

Basically you are saying you don't care bacause it's not worth your time,
but apparently you expect your stuff to be added to everyone's lynx. Right?


> > > As for utf8-encoded hyrules  - the hyphenation simply
> > > won't work or dictionary won't load by libhnj. In other words, each 
> > > signle 
> > > byte in  hyrules denotes a single "human letter", each single byte in 
> > > d.c.s.
> > > denotes a single "human letter" (and not part of letter) - to make direct
> > > table-driven translation possible.
> > 
> > You could change it to operate on shorts instead of bytes, right?
> 
>  Of course, but this will take a lot of my time (5 days of 8-hours hacking for
> implementing exactly what you want - hacking libhnj, gathering SGML tables,
> etc) - I can't spent so much time (remember - I have to implement lynx.cfg
> settings too  -this is 3 days more). So I prefer not to deal with unicode, I
> will describe interested people how to add support for utf8-d.c.s hyphenation 
> in lynx. Currently, hyphenation won't be ever take place if d.c.s is utf8 or

That's decidedly half-assed.  And if you can describe to people how to do
it, you could also do it!

It just doesn't make sense to me to add hyphenation that works only
in some display character sets when it *could* be done in a more general
way.  Well that, and I still think adding hyphenation at all makes little
sense except for hack value.

> HTCJK != NOCJK, so no crashes, just silent rejection. You won't use it, so you
> won't suffer.

But I *will* suffer, if you get Tom to include the code in the general
lynx, by having to wade through confusing #ifdefs and so on.

> I don't set utf8 d.c.s., so I won't suffer. IMO very few people use utf8
> d.c.s.

More will.

>  I afraid that if I'll try to implement utf8 in a limited period of time,
> I'll be fired.

Nobody is imposing a limit.

   Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]