bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] --input-file flag causes --domains to be ignored


From: Jason Barry
Subject: [Bug-wget] --input-file flag causes --domains to be ignored
Date: Sat, 12 Oct 2019 12:24:25 -0700

Hello, 

It appears that when the `--input-file` flag is specified, the `--domains` flag 
is ignored, and all hosts are spanned, even though I haven't explicitly set 
`-H`. 

I would assume that the host would be restricted to what is specified in 
`--domains`, or if absent, in `--base`, but that doesn't seem to be the case. 

wget \
  --base=https://$1 \
  --content-disposition \
  --continue \
  --convert-links \
  --domains=$1 \
  --execute robots=off \
  --force-directories \
  --force-html \
  --input-file=$1.html \
  --level=1 \
  --max-redirect=2 \
  --output-file=$1.log \
  --random-wait \
  --recursive \
  --reject mp3 \
  --reject-regex "(.*)\?(.*)" \
  --timestamping \
  --wait=0.1 

This is crawling and downloading files from domains that aren't $1. 

Using GNU Wget 1.20.1 built on darwin18.2.0. 

Jason


reply via email to

[Prev in Thread] Current Thread [Next in Thread]