lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev 0x9A bug


From: Klaus Weide
Subject: Re: lynx-dev 0x9A bug
Date: Tue, 5 Oct 1999 08:10:13 -0500 (CDT)

On Tue, 5 Oct 1999, Karel Kulhavy wrote:

[ Reformatted for quoting - watch your line lenght! ]

> I've found out that when I run lynx in -dump -raw mode, Lynx removes
characters 0x9A from the original source.
 
> This bug is in version 2.7.1 as well as in version 2.8.2rel.1.
 
> I have a html file containing Czech text in cp1250 encoding. Some
> czech words contain char 0x9A which is small letter 's' with
> caron. After running lynx -dump -raw on this local html file, the
> 0x9A character is left out in the output, although the characters
> around this character forming the word are left untouched.

Depending on circumstances this may be expected, as a precaution
against having this byte (and others in the range 0x80..0x9F)
act as a control character.  It depends on your environment whether
that makes sense or not; but if you want lynx to spit out such
bytes as if they were normal displayable characters, you have to
*tell it* that your Display Character Set is one where these characters
are allowed.

For this use of -dump, lynx uses basically the same logic as for
normal interactive display.  So you should see the same effect.
With -dump, lynx should use the D.C.S. saved from the Options Screen
(in .lynxrc) or set in lynx.cfg (called simply CHARACTER_SET there).


What OS are you using?

Are you *sure* that it is lynx that is removing the character?
Just echoing the file to the screen may not be enough to check -
since the byte may actually act as a control character.

Does this happen only with 0x9A, or also with other characters in the
range 0x80..0x9F?

So what is your effective Display Character Set?  Is it actually
what you want to get out of lynx?

Have you set an ASSUME_LOCAL_CHARSET and/or ASSUME_CHARSET in
lynx.cfg?  You should set e.g. the first one if lynx should assume
that local files are all in the windows-1250 charset.  Then the -raw
should not be needed for your local file example.  (Leave it out
when it isn't needed - it might actually confuse things.)

Does the file contain a META tag with charset specification?
(In that case, ASSUME_* would not be used.)

With which screen handling library was lynx compiled? (curses/ncurses/
slang?)  There could be some relevant code differences.


    Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]