lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Chartrans patches impressions..


From: Klaus Weide
Subject: Re: LYNX-DEV Chartrans patches impressions..
Date: Sun, 2 Mar 1997 21:43:26 -0600 (CST)

Thank you for the more detailed description.

On Sun, 2 Mar 1997, Hynek Med wrote:
> On Sat, 1 Mar 1997, Klaus Weide wrote:
> 
> > I am not particularly eager to claim space on the limited real estate of
> > the Options screen. (until someone rewrites the whole thing...)
> 
> What about another Option screen just for character set things? :-)

Uhmm, no thanks, not me.

> > So maybe you should
> > explain to me how you use the code.  With some real examples.  Maybe then
> > I can better understand why the toggling that the '@' key provides,
> > together with -assume_charset, is not enough.
> 
> Many Czech pages (the 95% I write about above) - and I guess this is the
> same with most other non-US-ASCII pages - are in various Czech/Central
> European character sets, but they don't have their character-set marked.
> (This is wrong, but that's how life goes. Marked pages (mostly as
> Windows-1250) would be unreadable in other lynxes, anyway.)  The character

If you mean that the other lynxes would refuse to display the document and
offer a download prompt instead - they don't do this if the charset is
specified in a META tag (instead of the real HTTP headers).  Just to
clarify.

> set is usualy determined by users' choice (there's a link to something
> like "select your encoding" on the page) or by a hint of the browser - the
> recoding is done by cgi-scripts or modules to Apache httpd. So, when I
> want to see such a page, I must either select the raw mode or use the new
> assume_charset command line switch to get the character set right
> (otherwise would lynx try to translate between ISO-8859-1 and my display
> character set, which would produce incorrect results). When I select the
> raw mode/assume_charset, it works right - 

The following should currently work (using -assume_charset and '@' in
combination):  
Your display (C)haracter set is set to "ISO Latin 2".
Start lynx with  lynx -assume_charset=windows-1250 ..., 
then when you use '@' it should toggle between "assume unlabelled is iso-8859-1"
and "assume unlabelled is windows-1250".  So you can see docs in those
two charsets correctly, with come inconvenience of reloading (which is
not lynx's fault), but without having to restart lynx.  Docs which are
explicitly labelled should of course also work, if lynx understands
the charset.

Tell me if that doesn't work as described.

> now the only thing needed is to
> put these options to lynx.cfg for the system administrators, which is not
> possible now, if I understand it right. 

I don't think putting them in lynx.cfg for system administrators would be
a good idea.  The system administrator should not have any business of
setting charset defaults (and thereby, indirectly, language defaults)
for what his/her users browse on *remote* systems.  (OTOH
-assume_local_charset could belong there.)  Of course this is only
theoretical for single-user machines where the sys admin is the only user.

> Oh, I'm trying that as I write, and it looks like the -assume_charset
> looks it doesn't work. 
> 
> A real example. Load an ISO-8859-2 font (setfont lat2-16.psf on your Linux
> console). 

Actually, I don't have to explicitly do load the font, that's what 
-DEXP_CHARTRANS_AUTOSWITCH does for me :)   (linux only)

> Then select this in your Options screen, save your options
> and quit lynx. (You have to select the ISO-8859-2 preffered document
> charset to make the httpd send you ISO-8859-2 encoded document - it
> sends Windows-1250 and unmarked as default.)
> 
>      display (C)haracter set      : ISO Latin 2
>      Raw 8-bit or CJK m(O)de      : OFF
>      preferred document lan(G)uage: cz,en
>      preferred document c(H)arset : ISO-8859-2
>  
> Then run lynx http://pes.eunet.cz. What you see is wrong, for example the
> first option in the form reads [Dne^1ni eislo_____] (with acutes above the
> i's). When you run lynx -raw pes.eunet.cz, it's right, [Dnesni
> cislo______], with acutes above the i's and a caron above the s.
> 
> All in all, lynx -raw works.
> 
> On the other hand, lynx -assume_charset=ISO-8859-2 or lynx -assume_charset
> ISO-8859-2 (which of these is correct, btw?) 

both forms are recognized the same way (like all others that take a value)

> doesn't work, it produces the
> same output as without the raw switch, which shouldn't, and it's even in
> the document info wrong:
> 
> Charset: iso-8859-1 (assumed)

Ok, this needs an explanation...

What I have written in the README.chartrans file is

 - The "Raw" toggle (from -raw flag, '@' key, or Options screen)
   o  [...]
   o  otherwise toggles the assumption "Default remote charset is same
      as Display Character Set" on or off.

What I haven't documented anywhere is the following:

   o  IF a document's charset is unlabelled, and the charset to assume
      for unlabelled documents (via -assume_.. flag) is already the
      same as the selected display (C)haracter set (so that toggling "Raw"
      as described above wouldn't make any difference),
      THEN toggling "Raw" means switch between
      - assume unlabelled docs are what -assume_... and display (C).s. says
      - assume unlabelled docs are the default of defaults, i.e. iso-8859-1.

You see, there was this '@' key that would otherwise have no effect under
those circumstances.  And I thought It would be useful to toggle
between those two states, without leaving lynx or changing -assume_...
The result is that
 - if you have -assume_charset=iso-8859-2 AND display (C).s. = ISO Latin 2,
   you should also have -raw for unlabelled iso-8859-2 docs (or use '@').
 - the behaviour w.r.t. "Raw" on/off is then the same as it was without
   chartrans code.

Whether this is a good idea is up for discussion...

> If you want to see an example that works, try http://modrysvet.codalan.cz
> or http://www.atlas.cz. These are Windows-1250 and marked, when you have
> Display charset ISO Latin 2, translation is done right.
> 
> As I look on it in detail.. there's one minor thing, though - the Linkname
> in the info screen (=) is still wrong, both for the current document and
> for the link you are on (try to see the "slozitejsi dotaz" link on
> www.atlas.cz - when you are on it and press = key, on the info screen you
> see (slo 3/4itij^1i dotaz"  instead), it looks like it didn't undergo the
> translation. And so didn't the Title on the top of screen - it reads
> "ATLAS: vyhledavani v Eeskem Internetu" instead of "ATLAS: vyhledavani v
> Ceskem Internetu", with some accents. (The problem is that C with caron
> changed to E in the work "Ceskem".)

Yes, translation of all thoses strings where it is needed is not in place.
But some testing reveals:
  flags                                  TITLE OK    '=',history,'V' etc. OK
(nothing)                                    NO                NO
-raw                                         YES               YES
-assume_charset=iso-8859-2                   NO                NO
-assume_charset=iso-8859-2 -raw              YES               YES
-assume_charset=windows-1250                 YES               NO
-assume_charset=windows-1250 -raw            YES               YES
-assume_local_charset=windows-1250           YES               NO 
-assume_local_charset=windows-1250 -raw      YES               YES
-assume_local_charset=iso-8859-2             NO                YES
-assume_local_charset=iso-8859-2 -raw        YES               YES

Confused now?  Well I am..

The -assume_local_charset comes into play because lynx creates
temporary files for '=', history list, 'V' etc. screens and then reads
them in.

> > (I am not even sure that the '@' key, together with the new
> > -assume_charset etc. options, work.  I am sure there are situations were
> > they don't.  More feedback requested.  I also don't know whether I have
> > messed up the CJK charset handling.)  
> 
> Well, as you see even I didn't know. At first I thought everything
> worked right and only needs to be saveable, but I learned that the
> -assume_charset switch doesn't work at all and that some things don't get 
> translated.. :-) 

As you see, by combining several options one can often get the
"correct" result but that isn't quite optimal yet..

  Klaus


;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]