bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Some possible Inconsistencies in WGET 1.15


From: Halliday, Andrew
Subject: [Bug-wget] Some possible Inconsistencies in WGET 1.15
Date: Wed, 23 Jul 2014 03:10:27 +0000

Hi,

I've recently undertaken an exercise to map the command line switches of WGET 
1.15 alongside the commands available in the config file I specify with 
--config=FILE.  See attached file for detail.

In doing this, I've noticed that some command line switches don't have matching 
commands for the file:
-a,  --append-output=FILE  append messages to FILE.
--report-speed=TYPE   Output bandwidth as TYPE.  TYPE can be bits.
--unlink                  remove file before clobber.
--method=HTTPMethod     use method "HTTPMethod" in the header.
--body-data=STRING      Send STRING as data. --method MUST be set.
--body-file=FILE        Send contents of FILE. --method MUST be set.
--content-on-error      output the received content on server errors.
--https-only             only follow secure HTTPS links
--preserve-permissions  preserve remote file permissions.
--accept-regex=REGEX        regex matching accepted URLs.
--reject-regex=REGEX        regex matching rejected URLs.
--regex-type=TYPE           regex type (posix).
--warc-file=FILENAME      save request/response data to a .warc.gz file.
--warc-header=STRING      insert STRING into the warcinfo record.
--warc-max-size=NUMBER    set maximum size of WARC files to NUMBER.
--warc-cdx                write CDX index files.
--warc-dedup=FILENAME     do not store records listed in this CDX file.
--no-warc-compression     do not compress WARC files with GZIP.
--no-warc-digests         do not calculate SHA1 digests.
--no-warc-keep-log        do not store the log file in a WARC record.
--warc-tempdir=DIRECTORY  location for temporary files created by the WARC 
writer.


I've also noticed that there are also some commands which are not available as 
switches in the command line:
#dot_bytes = n
#dot_spacing = n
#dots_in_line = n
#netrc = on/off
#robots = on/off
#show_all_dns_entries = on/off

Just thought I might assist with some of those apparent inconsistencies 
observed.

In particular, it would be nice to be able to:

*         Ignore robots as a command line switch

*         Apply regex to URLs in the command file

Hope this helps!
Andrew

-----------------------------------------------------------------------
This email, and any attachments, may be confidential and also privileged. If 
you are not the intended recipient, please notify the sender and delete all 
copies of this transmission along with any attachments immediately. You should 
not copy or use it for any purpose, nor disclose its contents to any other 
person.
-----------------------------------------------------------------------

Attachment: WGetSampleSettings
Description: WGetSampleSettings


reply via email to

[Prev in Thread] Current Thread [Next in Thread]