lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Chartrans patches impressions..


From: Klaus Weide
Subject: Re: LYNX-DEV Chartrans patches impressions..
Date: Tue, 4 Mar 1997 20:25:54 -0600 (CST)

On Tue, 4 Mar 1997, Hynek Med wrote:
> On Sun, 2 Mar 1997, Klaus Weide wrote:
> 
> > The following should currently work (using -assume_charset and '@' in
> > combination):  Your display (C)haracter set is set to "ISO Latin 2". 
> > Start lynx with lynx -assume_charset=windows-1250 ..., then when you use
> > '@' it should toggle between "assume unlabelled is iso-8859-1"  and
> > "assume unlabelled is windows-1250". 
> 
> It works fine with -assume_charset=windows-1250. 
> 
> I tried -assume_charset=ISO-8859-2, which didn't work, ISO-8859-1 was
> assumed, so I wrote that -assume_charset doesn't work.. After some
> experiments I found out that the -assume charset flag works only when the
> Display and Assumed charsets differ. If they don't, ISO-8859-1 is assumed.
> Was this meant to be so? 
[ See below ]
 
> > I don't think putting them in lynx.cfg for system administrators would be
> > a good idea.  The system administrator should not have any business of
> > setting charset defaults (and thereby, indirectly, language defaults)
> > for what his/her users browse on *remote* systems. 
> 
> I don't agree with you here. If the sysadmins don't set there anything
> reasonable, the users won't be able to read the documents with right
> accents (because the documents aren't marked etc.). They aren't able to
> read cyrillic/chinese/hebrew/whatever by now anyway (we don't have the
> fonts etc), so it wouldn't limit them any more than they are limited now -
> it would just help them to see the documents in Eastern european (read: 
> local) encodings.. 

I am thinking about different situations than you apparently are.
For example: Sysadmin who provides dialup access to Lynx, let's say in the
US; doesn't know anything about charsets (because he/she usually doesn't
need to); some of his/her clients like to read Web pages in a foreign
(maybe their own) language, and they have the necessary fonts (if such are
needed for that language).

> > [ self-editing snips ]
> > What I haven't documented anywhere is the following:
> > 
> >    o  IF a document's charset is unlabelled, and the charset to assume
> >       for unlabelled documents (via -assume_.. flag) is already the
> >       same as the selected display (C)haracter set (so that toggling "Raw"
> >       as described above wouldn't make any difference),
> >       THEN toggling "Raw" means switch between
> >       - assume unlabelled docs are what -assume_... and display (C).s. says
> >       - assume unlabelled docs are the default of defaults, i.e. iso-8859-1.
> > 
> > The result is that
> >  - if you have -assume_charset=iso-8859-2 AND display (C).s. = ISO Latin 2,
> >    you should also have -raw for unlabelled iso-8859-2 docs (or use '@').
> >  - the behaviour w.r.t. "Raw" on/off is then the same as it was without
> >    chartrans code.
> > 
> > Whether this is a good idea is up for discussion...
> 
> Well.. I don't think so. I use the -assume_charset flag to override the
> assumption that the document is in ISO-8859-1, because of the many
> unmarked documents that are in ISO-8859-2 or Windows-1250. Why this
> shouldn't work when the assumed charset is the same as my display Charset? 

But it does work - as long as you have also 'raw' enabled...
I agree that that is not very intuitive.
OTOH We have that precious '@' key - which we are already conditioned to
use to "get the character set right" - should it do nothing in this case?

Maybe I should reverse the sense of "raw" here - only that would be even
less intuitive (then "raw" enabled would mean "DO translate". Probably not
good.  Although for ISO Latin 1 and CJK display Character sets, `-raw'
is _already_ used to turn raw mode OFF, rather than ON.  see
comments in lynx.cfg).

> > Yes, translation of all thoses strings where it is needed is not in place.
> > But some testing reveals:
> >   flags                                  TITLE OK    '=',history,'V' etc. OK
> > (nothing)                                    NO                NO
> > -raw                                         YES               YES
> > -assume_charset=iso-8859-2                   NO                NO
> > -assume_charset=iso-8859-2 -raw              YES               YES
> > -assume_charset=windows-1250                 YES               NO
> > -assume_charset=windows-1250 -raw            YES               YES
> > -assume_local_charset=windows-1250           YES               NO 
> > -assume_local_charset=windows-1250 -raw      YES               YES
> > -assume_local_charset=iso-8859-2             NO                YES
> > -assume_local_charset=iso-8859-2 -raw        YES               YES
> > 
> > Confused now?  Well I am..
> 
> Who wouldn't be. :-)
> 
> > The -assume_local_charset comes into play because lynx creates
> > temporary files for '=', history list, 'V' etc. screens and then reads
> > them in.
> 
> OK, it works fine with -assume_local_charset.

Note that it also seems to work fine (according to the table above) in all
cases as long as `-raw' is among the options.

Of course whether titles are displayed correctly for remote documents (in
history lists etc.) should not depend on -assume_local_charset at all, so
that is something to fix.

> BTW, why not to set -assume_local_charset to the one we got by 
> -assume_charset? 

That is how I had it at first, I think.  But distinguishing the two seems
very useful (and obvious) to me.  Typically most text in local files on a
machine will be in one specific charset [[unwarranted assumption?]],
depending on installation / locale.  That choice is not logically
connected to what assumption to make about remote documents that
Webmasters haven't bothered to label correctly.  
The `-assume_local_charset' is kind of like labelling files in the local
filesystem, which is otherwise not possible in general (no HTTP headers;
HTML files are an exception because they allow a META tag).

Of course if your local files use iso-8859-2 character encoding, your Lynx
display Character set is set to ISO Latin 2, and you only browse sites
that use the iso-8859-2 charset, then everything coincides for you, and
you are wondering why all this mess of different options.

> Do we even need a local charset in this stage, when
> without correct local_charset we don't have right "global" documents? 

Well it's a bug (or an incomplete implementation..), and I need to fix it.

 Klaus

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]