lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] Unicode-marking, &c


From: Thomas Dickey
Subject: Re: [Lynx-dev] Unicode-marking, &c
Date: Thu, 26 Feb 2009 15:53:18 -0500
User-agent: Mutt/1.5.18 (2008-05-17)

On Thu, Feb 26, 2009 at 06:49:02PM +0000, Thorsten Glaser wrote:
> Thomas Dickey dixit:
> 
> >> Here under Windows there are constant references to the character that
> >> begins a 16-bit-wide-character file (FF FE) or UTF-8 file (EF BB BF).
> 
> Note that this is not about Windows® though ??? the Byte Order Mark,
> Unicode FEFF, UCS-2BE 0xFE 0xFF, UCS-2LE 0xFF 0xFE, UTF-8 0xEF 0xBB 0xBF,
> is a standardised thing.
> 
> > Lynx handles _some_ cases - but a url would help, so we can see.
> 
> Attached.
> 
> Lynx handles all three poorly: the UTF-8 BOM isn???t stripped, the UCS-2
> files end with an ampersand instead of the ??? (ellipsis).

Lynx assumes the document charset is ISO-8859-1 if it's not given.
(That was the rule for some time - for HTML - perhaps we're not
discussing HTML anymore).

Setting that to UTF-8 makes it display properly.

0xFE is a valid ISO-8859-1 code, as your terminal emulator shows...

-- 
Thomas E. Dickey <address@hidden>
http://invisible-island.net
ftp://invisible-island.net

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]