[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Unexpected character on a downloaded page

From: Ángel González
Subject: Re: [Bug-wget] Unexpected character on a downloaded page
Date: Mon, 16 Jun 2014 23:10:45 +0200
User-agent: Thunderbird

On 16/06/14 13:08, Angel Tsankov wrote:
Indeed, the browser (Firefox 27.0.1) displays the original page in UTF-8 and the downloaded page in Windows-1252 (which turned out to be the fallback encoding for pages that do not declare their encoding). But if "wget is doing nothing here" why does the browser think that only the original page declares its encoding?


Angel Tsankov

That's because it does so in the server headers:
  HTTP/1.1 200 OK
  Server: cloudflare-nginx
  Date: Mon, 16 Jun 2014 20:57:44 GMT
  Content-Type: text/html; charset=utf-8
  Transfer-Encoding: chunked
  Connection: keep-alive
Set-Cookie: __cfduid=d013cee3c290f7e90e20da6d064d43b7b1402952264442; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.helloquizzy.com; HttpOnly
  Cache-control: private
  X-OKWS-Version: OKWS/
P3P: CP="NOI CURa ADMa DEVa TAIa OUR BUS IND UNI COM NAV INT", policyref="http://www.helloquizzy.com/w3c/p3p.xml";
  X-XSS-Protection: 1; mode=block
Set-Cookie: guest=13646369309274507826; Expires=Tue, 16 Jun 2015 20:57:44 GMT; Path=/; Domain=helloquizzy.com; HttpOnly
  CF-RAY: 13b9ebe4ce25024c-CDG

When you save the page contents, the headers are not available*. The page might had additionally declared them in a meta tag in the <head>, in which case firefox would have detected correctly the encoding from the local page, but as it's not the case, it -in absence of other guesses- uses a fallback typically dependent on the user's locale. See http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding

* Actually, you could store them in the file with --save-headers, but firefox wouldn't know those should be treated as http headers, so that wouldn't help.

Best regards

reply via email to

[Prev in Thread] Current Thread [Next in Thread]