bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] wget / character set


From: Alex Davies
Subject: [Bug-wget] wget / character set
Date: Sun, 6 Sep 2009 23:44:43 +0100

Hi All,

I am attempting to wget a page, but find that wget is mangling some of
the characters inside the page and i'm not quite sure why.

For example, the command

# wget -d 
http://sites.google.com/a/dutymanagement.org/members/who-s-who/operations-team

Shows the character set is picked up correctly:

Content-Type: text/html; charset=utf-8

However the downloaded files shows lines such as this in vi:

... </A><200e> &gt; <200e> ...

And is mashed in a web browser too. I'm not quite sure what the <200e>
means or where it comes from.

Is there a easy way to prevent this from occurring? (i've tried to set
the header --header='Accept-Charset: UTF-8')

Many thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]