Re: [Bug-wget] Unexpected character on a downloaded page

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Unexpected character on a downloaded page

From:	Ángel González
Subject:	Re: [Bug-wget] Unexpected character on a downloaded page
Date:	Sun, 15 Jun 2014 22:28:14 +0200
User-agent:	Thunderbird

On 14/06/14 20:31, Angel Tsankov wrote:

Why does wget 1.15 (and 1.12) insert Â in several places in the copyit makes of the following page:
http://www.helloquizzy.com/results/helen-fisher-personality-type-test/?var_Explorer=1&var_Negotiator=1&var_Director=1&var_Builder=1&fromCGI=1

Short answer: because that's what is at that page.

Long answer: That page contains several non-breaking spaces (ASCII 160,U+00A0) which when encoded as UTF-8 result in the bytes C2 A0. If youread the page as if it was iso-8859, you will view instead the byte C2as the glyph Â.


The page correctly states it's in utf-8:

Content-Type: text/html; charset=utf-8

so it should be read in utf-8 mode.

(wget is doing nothing here, it's just receiving bytes and storing inthe file as-is)

Also, why does it download the page to'index.html?var_Explorer=1&var_Negotiator=1&var_Director=1&var_Builder=1&fromCGI=1'(i.e. with index.html prepended)?

That's because if you just downloadedhttp://www.helloquizzy.com/results/helen-fisher-personality-type-test/(ie. no filename) it will save it as index.html

I think you could argue both ways for .

Regards,

Angel Tsankov

Best regards

Ángel González

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] Unexpected character on a downloaded page, Angel Tsankov, 2014/06/14
- Re: [Bug-wget] Unexpected character on a downloaded page, Ángel González <=
  - Re: [Bug-wget] Unexpected character on a downloaded page, Angel Tsankov, 2014/06/16
    - Re: [Bug-wget] Unexpected character on a downloaded page, Ángel González, 2014/06/16
    - Re: [Bug-wget] Unexpected character on a downloaded page, Angel Tsankov, 2014/06/17

Prev by Date: [Bug-wget] Unexpected character on a downloaded page
Next by Date: Re: [Bug-wget] [bug-wget] Libpsl for cookie domain checking in Wget
Previous by thread: [Bug-wget] Unexpected character on a downloaded page
Next by thread: Re: [Bug-wget] Unexpected character on a downloaded page
Index(es):
- Date
- Thread