bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Bug or at least strange behaviour


From: Fahiem Bacchus
Subject: Bug or at least strange behaviour
Date: Thu, 16 Apr 2020 14:07:13 -0400

Hi, I am creating an scientific archive containing problem sets and want to
post wget instructions for downloading the problem sets.

1. wget -r -nd -erobots=off
http://www.cs.toronto.edu/maxsat-lib/maxsat-instances/master-set/unweighted
-A 'zip'
    Works, it descends to the subdirectories under unweighted, and
retrieves the zip files in contained in each subdirectory.
2. wget -r -nd -erobots=off
http://www.cs.toronto.edu/maxsat-lib/maxsat-instances/master-set/ -A 'zip'
    Does not work it stops after rejecting the index.html file in
master-set.
3. wget -r -nd -erobots=off
http://www.cs.toronto.edu/maxsat-lib/maxsat-instances/master-set/
    Kind of works, it gets all of the files, but does not restrict itself
to the zip files.

Maybe I don't understand the options? But it looks like a bug in the
interaction of the -A flag and descending into
subdirectories?

   thanks
     Fahiem Bacchus

Here is the site
http://www.cs.toronto.edu/maxsat-lib/

With directory structure:
  master-instances
       master-set
            unweighted
                CircuitDebuggingProblems
                        CircuitDebuggingProblems.zip
                .... many other subdirs each containing a zip
            weighted
                many subdirs each containing a zip
       ms-evals
       original

I also tried a -l 10 flag...did not help.

Version info:
============
GNU Wget 1.20.3 built on darwin18.6.0.

-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls
+ntlm +opie -psl +ssl/openssl

Wgetrc:
    /usr/local/etc/wgetrc (system)
Locale:
    /usr/local/Cellar/wget/1.20.3_1/share/locale
Compile:
    clang -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/usr/local/etc/wgetrc"
    -DLOCALEDIR="/usr/local/Cellar/wget/1.20.3_1/share/locale" -I.
    -I../lib -I../lib -I/usr/local/opt/openssl@1.1/include -DNDEBUG -g
    -O2
Link:
    clang -DNDEBUG -g -O2 -lidn2 -L/usr/local/opt/openssl@1.1/lib -lssl
    -lcrypto -ldl -lz ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a
    -liconv -lintl -Wl,-framework -Wl,CoreFoundation -lunistring

Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Originally written by Hrvoje Niksic <address@hidden>.
Please send bug reports and questions to <address@hidden>.
=========
/usr/local/etc/wgetrc
--------------------------
###
### Sample Wget initialization file .wgetrc
###

## You can use this file to change the default behaviour of wget or to
## avoid having to type many many command-line options. This file does
## not contain a comprehensive list of commands -- look at the manual
## to find out what you can put into this file. You can find this here:
##   $ info wget.info 'Startup File'
## Or online here:
##   https://www.gnu.org/software/wget/manual/wget.html#Startup-File
##
## Wget initialization file can reside in /usr/local/etc/wgetrc
## (global, for all users) or $HOME/.wgetrc (for a single user).
##
## To use the settings in this file, you will have to uncomment them,
## as well as change them, in most cases, as the values on the
## commented-out lines are the default values (e.g. "off").
##
## Command are case-, underscore- and minus-insensitive.
## For example ftp_proxy, ftp-proxy and ftpproxy are the same.


##
## Global settings (useful for setting up in /usr/local/etc/wgetrc).
## Think well before you change them, since they may reduce wget's
## functionality, and make it behave contrary to the documentation:
##

# You can set retrieve quota for beginners by specifying a value
# optionally followed by 'K' (kilobytes) or 'M' (megabytes).  The
# default quota is unlimited.
#quota = inf

# You can lower (or raise) the default number of retries when
# downloading a file (default is 20).
#tries = 20

# Lowering the maximum depth of the recursive retrieval is handy to
# prevent newbies from going too "deep" when they unwittingly start
# the recursive retrieval.  The default is 5.
#reclevel = 5

# By default Wget uses "passive FTP" transfer where the client
# initiates the data connection to the server rather than the other
# way around.  That is required on systems behind NAT where the client
# computer cannot be easily reached from the Internet.  However, some
# firewalls software explicitly supports active FTP and in fact has
# problems supporting passive transfer.  If you are in such
# environment, use "passive_ftp = off" to revert to active FTP.
#passive_ftp = off

# The "wait" command below makes Wget wait between every connection.
# If, instead, you want Wget to wait only between retries of failed
# downloads, set waitretry to maximum number of seconds to wait (Wget
# will use "linear backoff", waiting 1 second after the first failure
# on a file, 2 seconds after the second failure, etc. up to this max).
#waitretry = 10


##
## Local settings (for a user to set in his $HOME/.wgetrc).  It is
## *highly* undesirable to put these settings in the global file, since
## they are potentially dangerous to "normal" users.
##
## Even when setting up your own ~/.wgetrc, you should know what you
## are doing before doing so.
##

# Set this to on to use timestamping by default:
#timestamping = off

# It is a good idea to make Wget send your email address in a `From:'
# header with your request (so that server administrators can contact
# you in case of errors).  Wget does *not* send `From:' by default.
#header = From: Your Name <username@site.domain>

# You can set up other headers, like Accept-Language.  Accept-Language
# is *not* sent by default.
#header = Accept-Language: en

# You can set the default proxies for Wget to use for http, https, and ftp.
# They will override the value in the environment.
#https_proxy = http://proxy.yoyodyne.com:18023/
#http_proxy = http://proxy.yoyodyne.com:18023/
#ftp_proxy = http://proxy.yoyodyne.com:18023/

# If you do not want to use proxy at all, set this to off.
#use_proxy = on

# You can customize the retrieval outlook.  Valid options are default,
# binary, mega and micro.
#dot_style = default

# Setting this to off makes Wget not download /robots.txt.  Be sure to
# know *exactly* what /robots.txt is and how it is used before changing
# the default!
#robots = on

# It can be useful to make Wget wait between connections.  Set this to
# the number of seconds you want Wget to wait.
#wait = 0

# You can force creating directory structure, even if a single is being
# retrieved, by setting this to on.
#dirstruct = off

# You can turn on recursive retrieving by default (don't do this if
# you are not sure you know what it means) by setting this to on.
#recursive = off

# To always back up file X as X.orig before converting its links (due
# to -k / --convert-links / convert_links = on having been specified),
# set this variable to on:
#backup_converted = off

# To have Wget follow FTP links from HTML files by default, set this
# to on:
#follow_ftp = off

# To try ipv6 addresses first:
#prefer-family = IPv6

# Set default IRI support state
#iri = off

# Force the default system encoding
#localencoding = UTF-8

# Force the default remote server encoding
#remoteencoding = UTF-8

# Turn on to prevent following non-HTTPS links when in recursive mode
#httpsonly = off

# Tune HTTPS security (auto, SSLv2, SSLv3, TLSv1, PFS)
#secureprotocol = auto
-- 
Fahiem Bacchus
Professor of Computer Science
University of Toronto


reply via email to

[Prev in Thread] Current Thread [Next in Thread]