bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget returns HTTP 302 found and does not download all con


From: Paul Wratt
Subject: Re: [Bug-wget] wget returns HTTP 302 found and does not download all content of webpage
Date: Sat, 14 Jan 2012 19:52:37 +1300

note that this is originally the default behaviour for 1.12 (correctly
processing 302)

If I recall there is a comment about "trusted hosts" being OFF by
default in 1.13 meaning 302's dont get processed correctly by default

2012/1/6 Ángel González <address@hidden>:
> On 04/01/12 16:28, Umair wrote:
>>
>> Hi,
>>
>>
>>  Now If i use the url http://www.google.com as an argument to wget
>> command,
>> i get the following output:
>
> (...)
>
>> FINISHED --2012-01-04 16:03:57--
>> Downloaded: 1 files, 8.7K in 0.02s (440 KB/s)
>>
>> *******************************************************************************************************************
>>
>> If i use http://www.google.de as url, then it successfully downloads the
>> web page with the following results:
>>
>> Downloaded: 6 files, 55K in 0.06s (849 KB/s)
>>
>> Please note the difference between downloaded content in case of redirect
>> and no redirect. Same happens with any other url when it involves a
>> redirect with HTTP status code 302. i.e. only 1 html file is downloaded in
>> case of redirect.
>
> Confirmed. In summary, -p (--page-requisites) is apparently "lost" when the
> original url is a redirect to a different location.
>
>
>
>> Kindly suggest me the possible solution of this error. Is it really an
>> error or am i missing something?
>
> Add --span-hosts to the command line.
>
> It's arguable what is the correct behavior, although the current one seems
> consistent with what the user could expect without further knowledge of the
> host setup.
>
> When you ask www.google.com, the images are downloaded from www.google.de,
> so wget treats them as foreign (they are not in www.google.com), so you need
> the
> --span-hosts switch. Going to www.google.de they match the url provided in
> the
> command line.
>
>
>
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]