bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wget and --reject-regex


From: Tim Rühsen
Subject: Re: wget and --reject-regex
Date: Fri, 25 Dec 2020 18:42:14 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1

Hello Franz,

tried with wget 1.20.3 and these both command work:

#1 Do not download smc/artworks/ directory:
wget -d -4 --mirror -nH -np --retr-symlinks=no --passive-ftp --no-verbose --cut-dirs=1 ftp://mirror.netcologne.de/savannah/smc/ --reject-regex=".*(/artworks/.*)"

#2 Do not download .bz2 and .rpm files
wget -d -4 --mirror -nH -np --retr-symlinks=no --passive-ftp --no-verbose --cut-dirs=1 ftp://mirror.netcologne.de/savannah/smc/ --reject-regex=".*(\.bz2|\.rpm)$"

(--regex-type=posix is default)
(the order of URL and options doesn't matter)

Regards, Tim

On 23.12.20 13:48, Frans de Boer wrote:
LS,

I found that wget 1.20 and later do support some basic regular expressions. I had good results with --accept=-regex but the reject part is more troublesome. I can't use ERE's since only BRE's is supported with the notion that the whole URL should be included.

I use wget to mirror some sites, but I do not want certain sub directories included in the download. You can think of sub directories named rpm, debug, temp etc.

Example:

wget -4 --mirror -nH -np --retr-symlinks=no --passive-ftp --no-verbose --cut-dirs=1 --regex-type posix --reject-regex "ftp\:\/\/mirror\.netcologne\.de\/savannah\/smc\/Screensaver\/" -P ./debugdir/nongnu ftp://mirror.netcologne.de/savannah/smc/

I tried this example with or without partial backslashes, but none is working. I tried this also with a single file, to no avail too. I understand that one can added multiple reject statements but would rather use the ERE .*(dir1|dir2|dir3|...|dirx|(..ERE..)), but that is rather cumbersome when I have to specify them by hand. I do have already a ERE string ready and would like to use that instead. Breaking down this string again into multiple reject statement might also not work if I can't even reject one file or sub directory.

Is there a way to accomplish above without having to resort to loops and sed as the filtering tool?

Regards, Frans.




Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]