bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget 1.12 generates duplicated contents


From: Anh Ta
Subject: Re: [Bug-wget] wget 1.12 generates duplicated contents
Date: Fri, 22 Jul 2011 10:35:02 +0100

Hi Giuseppe,

You are right that the new version has fixed that bug. It also fixed the download of other domains which are redirected to, right?

In the previous versions, there were pages in the sites careers.beds.ac.uk and lrweb.beds.ac.uk downloaded because they were redirected from www.beds.ac.uk. With this version, only www.beds.ac.uk pages were downloaded.

I will test with greater recursion depth level and will let you know if there is any other problem.

Thank you very much for your help.

Anh

On 21 Jul 2011, at 16:06, Giuseppe Scrivano wrote:

that is a weird bug, but I have the feeling it might be fixed by a patch
I have committed some time ago related to the link rewriting stuff.

Can you try this source tarball?

ftp://alpha.gnu.org/gnu/wget/wget-1.12-2504.tar.bz2

If you have any question about how build wget from sources, just ask.

Thanks,
Giuseppe



Anh Ta <address@hidden> writes:

Hi Giuseppe,

The footer was only duplicated when I used recursive download option
(-
r). It was fine when I downloaded the single page. And yes, it
happened every time I ran that command on version 1.12 (Version 1.11.4
was good).

I am sorry I forgot to attached the index.html file in the previous
email. Here is the gzip file of the log file and the index.html file
from the command: wget -r -l 1 -E -k -nv --wait=0.5 --random-wait --
debug -o download.log http://www.beds.ac.uk





Thanks for your help and quick reply.

Anh


On 20 Jul 2011, at 09:50, Giuseppe Scrivano wrote:

Hello,

I couldn't reproduce the problem here, I get the same content I get
with
the browser.

Does it behave differently if you use a recursive download or if you
request a single page?  Does it happen everytime?

If you are able to reproduce it, can you please post the output you
get
running wget with --debug, otherwise please attach the content of
index.html.

Thanks,
Giuseppe



Anh Ta <address@hidden> writes:

Hi,

I ran the following command with wget 1.12:

wget -r -l 1 -E -k -nv --wait=0.5 --random-wait http://www.beds.ac.uk

The downloaded file www.beds.ac.uk/index.html (zip file attached )
contained duplicated footer. When I ran with greater depth level,
e.g. -l 15 and -p option, there were more pages with duplicated
footers.

The problem disappeared when I ran the same command with wget 1.11.4
. However, I need version 1.12 to have links in CSS downloaded and
replaced.

Could someone please help or give me some advices?

Many Thanks,
Anh




reply via email to

[Prev in Thread] Current Thread [Next in Thread]