bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Wget results VERY different from browser save


From: Marshall Burns
Subject: [Bug-wget] Wget results VERY different from browser save
Date: Wed, 2 Jan 2019 12:30:06 -0600

On closer inspection, I've found that the results from Wget and Firefox are
very different. Neither is perfect, but the Wget results are definitely
wrong. Here are the results from both:

 

=================================================

wget --append-output=Wget_Google.log --show-progress --no-directories
--adjust-extension --directory-prefix=download/Google2 --convert-links
--backup-converted --page-requisites --span-hosts http://www.Google.com

 

Contents of folder "download\Google2":

2016 12 07  19:00             5,482
googlelogo_white_background_color_272x92dp.png

2019 01 02  11:10            11,587      index.html

2019 01 02  11:10            11,437      index.html.orig

2016 12 16  06:30            12,263      nav_logo229.png

2018 11 16  04:00             6,913      robots.txt

               5 File(s)         47,682 bytes

 

Wget log: See attached "Wget_Google.log".

Result of save as viewed in Firefox: See attached "Google from Wget.png".

=================================================

Firefox at https://www.google.com/

File > Save Page As > Save as type: Web Page, complete

 

Contents of folder:

2019 01 02  11:15           222,403      Google2.htm

2019 01 02  11:15    <DIR>          Google2_files

 

Contents of subfolder "Google2_files"

2019 01 02  11:15           140,084      cbgapi.loaded_0

2019 01 02  11:15            13,504      googlelogo_color_272x92dp.png

2019 01 02  11:15            85,565
msb_wizaaabdasyncdvlfootiflipv6lummusfxz7cCd

2019 01 02  11:15           140,913
rsAA2YrTv-X7m9A6GmnfpSsKdPIfvIYg06ZQ

2019 01 02  11:15           403,380
rsACT90oGMg6Rr6Oa277nSkJoiMyEfVXOeOQ

               5 File(s)        783,446 bytes

 

Result of save as viewed in Firefox: See attached "Google from Firefox.png".

=================================================

Actual appearance of the webpage: See attached "Google original.png".

=================================================

 

Observations:

                * The main file saved by Firefox is 218 kb, that by Wget is
only 12 kb.

                * Firefox saves five additional files, Wget only three, and
none of them even have the same filenames!

                * Firefox gets the page layout right, including headers and
footers, but for some reason doesn't show the logo. Wget looks like it
downloaded a different page. The whole layout is different. But it got the
logo right.

 

What do I need to do for Wget to get the page correctly?

 

Thank you.

 

=================================================

 

 

 

 

From: address@hidden [mailto:address@hidden 
Sent: Wednesday, January 2, 2019 04:50
To: 'address@hidden'
Subject: How to simulate "Save as webpage, complete"?

 

Hi, not a bug, but a question:

 

The command:

wget --no-directories --adjust-extension --directory-prefix _files
--convert-links --page-requisites --span-hosts http://www.Google.com

 

saves the Google homepage as "index.html" along with associated files, all
together in the folder "_files". The result works nicely, but what I want is
for "index.html" to be in one folder and the associated files to be in a
subfolder of that called "_files". This is what a browser does when one asks
it to "save as webpage, complete." How do I simulate that behavior with
Wget?

 

The manual entry for -P / --directory-prefix says "the directory prefix is
the directory where all other files and subdirectories will be saved."
Because of the word "other," I thought this would do what I want, but it
didn't. It put all the files in the same directory, including "index.html".

 

I am using Wget, v. 1.20 as the Windows binary provided by Jernej Simončič
at www.eternallybored.org/misc/wget/ and running it in a DOS window
("Command Prompt") of Windows 7.

 

Thanks for your help.

 

Attachment: Wget_Google.log
Description: Binary data

Attachment: Google from Wget.png
Description: PNG image

Attachment: Google from Firefox.png
Description: PNG image

Attachment: Google original.png
Description: PNG image


reply via email to

[Prev in Thread] Current Thread [Next in Thread]