bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: I found bug in wget


From: Darshit Shah
Subject: Re: I found bug in wget
Date: Sun, 7 Mar 2021 23:20:06 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0

Hi,


On 07.03.21 01:14, Kmb697@Yandex.Ru wrote:
> Здравствуйте, Bug-wget.
> 
> I have found one unpleasant particularity Wget.
> Sometimes it can't completely copy recursive the site.
> Since pages and directory of the site are generated dynamically on the 
> grounds of Databasee (MySQL) and do not exist in realities.
> As example of the page of the shop
> https://modastori.prom.ua/g39944845-zhenskaya-obuv
> https://modastori.prom.ua/g39944845-zhenskaya-obuv/page_2
> When downloaded the second page it deletes first.
> wget.exe -x -c --no-check-certificate -i getprom.txt -P ".\shop"
> 
> File getprom.txt contains links
> https://modastori.prom.ua/g39944845-zhenskaya-obuv
> https://modastori.prom.ua/g39944845-zhenskaya-obuv/page_2
> 

This is simple. You first download a page called
"g39944845-zhenskaya-obuv", and then try to download a page in a
subdirectory called by the same name. This is not valid at last on Unix
filesystems, and hence Wget rejects to do so. Instead, it will delete
the old file and create a directory in its place.

Wget is smart enough to recognize this when performing a recursive
download and if you also have --convert-links enabled, it will save the
file as g39944845-zhenskaya-obuv.html to prevent a name collision.

In your particular case of using the -i switch, you could use the
--adjust-extension switch to force Wget to add the html extension. It
will however break links. The only way to fix that is to download it
recursively.

> It occur and when recursive download
> wget.exe -x -r --no-check-certificate https://modastori.prom.ua/ -P ".\shop"
> wget.exe -m --no-check-certificate https://modastori.prom.ua/ -P ".\shop"
> In this case wget can't create full copy of site.
> 

I' sorry, I cannot understand this section. What exactly is the problem
again?

> When download on the contrary
> https://modastori.prom.ua/g39944845-zhenskaya-obuv/page_2
> https://modastori.prom.ua/g39944845-zhenskaya-obuv
> All are OK.
> All files created.
> I think not correct made function of the check of existence of the file.
> OS: WinXP SP3 NTFS
> 

P.S.: I see that you're running Wget 1.11. That is an extremely ancient
version of Wget and absolutely not supported anymore. Please try to
update to a newer version of Wget.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]