lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] Unicode-marking, &c


From: David Woolley
Subject: Re: [Lynx-dev] Unicode-marking, &c
Date: Sun, 01 Mar 2009 11:27:58 +0000
User-agent: Thunderbird 2.0.0.19 (X11/20081209)

Thomas Dickey wrote:

yes - it has the meta tags after the title for UTF-8, but has a BOM right up front. Lynx isn't seeing the charset tag when it gets the page.

One possible factor here is that browsers aren't required to refetch the page when they get a charset clash in a meta element, so that meta element is supposed to be very near the front. I suspect the intent was that it should be immediately after <head>, and that browsers should perform a limited lookahead, if there was not charset in the real HTTP headers, or HTTP wasn't used. Unfortunately authoring tool writers just know they have to create a lot of boiler plate, including boiler plate for the benefit of the authoring tool, and don't think that some of it may need to have priority.

This one looks like a manual template, adapted from authoring tool output, e.g. look at the unconfigured description and keywords, but also note the generator.

However, my version of Lynx does correctly identify this as UTF-8, so the real problem may be in inappropriate error recovery, i.e. Lynx is assuming that non-ASCII characters, before the the meta charset, were real content that had been sent without at least a preceding title element.


--
David Woolley
Emails are not formal business letters, whatever businesses may want.
RFC1855 says there should be an address here, but, in a world of spam,
that is no longer good advice, as archive address hiding may not work.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]