|
From: | Ángel González |
Subject: | Re: [Bug-wget] Unexpected character on a downloaded page |
Date: | Mon, 16 Jun 2014 23:10:45 +0200 |
User-agent: | Thunderbird |
On 16/06/14 13:08, Angel Tsankov wrote:
Indeed, the browser (Firefox 27.0.1) displays the original page in UTF-8 and the downloaded page in Windows-1252 (which turned out to be the fallback encoding for pages that do not declare their encoding). But if "wget is doing nothing here" why does the browser think that only the original page declares its encoding?Regards, Angel Tsankov
That's because it does so in the server headers:
HTTP/1.1 200 OK Server: cloudflare-nginx Date: Mon, 16 Jun 2014 20:57:44 GMT Content-Type: text/html; charset=utf-8 Transfer-Encoding: chunked Connection: keep-aliveSet-Cookie: __cfduid=d013cee3c290f7e90e20da6d064d43b7b1402952264442; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.helloquizzy.com; HttpOnlyCache-control: private X-OKWS-Version: OKWS/3.1.27.0P3P: CP="NOI CURa ADMa DEVa TAIa OUR BUS IND UNI COM NAV INT", policyref="http://www.helloquizzy.com/w3c/p3p.xml"X-XSS-Protection: 1; mode=blockSet-Cookie: guest=13646369309274507826; Expires=Tue, 16 Jun 2015 20:57:44 GMT; Path=/; Domain=helloquizzy.com; HttpOnlyCF-RAY: 13b9ebe4ce25024c-CDG
When you save the page contents, the headers are not available*. The page might had additionally declared them in a meta tag in the <head>, in which case firefox would have detected correctly the encoding from the local page, but as it's not the case, it -in absence of other guesses- uses a fallback typically dependent on the user's locale. See http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding
* Actually, you could store them in the file with --save-headers, but firefox wouldn't know those should be treated as http headers, so that wouldn't help.
Best regards
[Prev in Thread] | Current Thread | [Next in Thread] |