bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Problem with ÅÄÖ and wget


From: Tim Ruehsen
Subject: Re: [Bug-wget] Problem with ÅÄÖ and wget
Date: Tue, 24 Sep 2013 10:38:30 +0200
User-agent: KMail/4.10.5 (Linux/3.10-3-amd64; KDE/4.10.5; x86_64; ; )

On Monday 23 September 2013 23:32:39 Ángel González wrote:
> On 17/09/13 09:49, Tim Ruehsen wrote:
> > On Tuesday 17 September 2013 00:17:21 Ángel González wrote:
> >>> [1] http://nikitathespider.com/articles/EncodingDivination.html
> >> 
> >> Note that these steps are outdated now (that was written at most at
> >> 2008).
> > 
> > Outdated by exactly what ? RFC3986 is of 2005 and does not contradict to
> > [1]. See my explanation above.
> 
> By the HTML Living Standard (formerly known as HTML5)
> http://www.whatwg.org/specs/web-apps/current-work/multipage/
> 
> The Content-type header is sometimes overriden, ISO-8859-1 now means
> windows-1252,
> there are some well-defined guessing steps when there's such need...

Just for completeness: these guessing steps called "encoding sniffing 
algorithm" are described in 12.2.2.2.
But only "In some cases, it might be impractical to unambiguously determine 
the encoding before parsing the document.".

I found this iso-8859-1 / windows-1252 issue mentioned on the Wikipedia  
'windows-1252' page, but couldn't find it on the HTML Living Standard pages.
Could you give me a pointer, please ?

What do you think, how can we address the iso / windows encoding issue (should 
we ?) ? As I understood, it is only valid for HTML5...

Is there a practical need for the sniffing algorithm ?
Do you know any real web sites / pages where the encoding is ambiguous ?

Tim




reply via email to

[Prev in Thread] Current Thread [Next in Thread]