bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Potential bug or something else?


From: Giuseppe Scrivano
Subject: Re: [Bug-wget] Potential bug or something else?
Date: Thu, 20 May 2010 19:23:49 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux)

what web sites are you trying to access and what wget version are you
using?

It smells like chunked transfer encoding data that the server sends
careless of the HTTP version specified by wget.  You can try to build
wget from the source repository, or using a recent alpha tarball where
HTTP/1.1 is supported.

Cheers,
Giuseppe



Mike <address@hidden> writes:

> Hi,
>
> I have been downloading some pages off one of my sites, however I
> sometimes get two 4-digit hex codes appear in the HTML source:
>
> Here's the start of one page:
>
> "209b
>          <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
> "http://www.w3.org/TR/html4/strict.dtd";>
>          <html>
>
>          <head>"
>
> The other 4-digit code appears later on in the page.
>
> Has anyone ever seen this before... it definitely doesn't appear on
> the original page.  It appears on all html files in particular
> directories, but some directories are clean.
>
> I'm running with this wget call:
> wget -A html,php,htm -b --default-page=__SLASH__.html --random-wait
> http://www.whateverurl.co.uk -w 10 -r -k -l 100 -U "Botlet"
>
> Any help much appreciated.  I can ad some post-processing to remove
> the codes but that feels like a hack.  Any ideas what it might be?
>
> Thanks,
> Mike.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]