Re: [Bug-wget] feature request : wget --delete-missing

From: aneeskA
Subject: Re: [Bug-wget] feature request : wget --delete-missing
Date: Sat, 6 Mar 2010 01:23:59 -0500


My original purpose was to keep a local copy of all the remote server files
using ftp mirroring.

While googling for a patch, I came across this discussion,
http://www.mail-archive.com/address@hidden/msg06759.html where the demand
for such an option is raised from 2004 itself.

-- aneeskA

On Fri, Mar 5, 2010 at 6:22 PM, Keisial <address@hidden> wrote:

> Micah Cowan wrote:
>> The page you linked to said (in Japanese) that it's foolish to have to
>> rm -r the directory before each mirror attempt with wget. I don't really
>> agree: since wget's going to have to download each and every file
>> _anyway_, just in order to ensure that it finds all the available links,
>> it hardly seems useful to leave the previous files around, just so they
>> get overwritten.
> I have to disagree with you, Micah.
> Consider the case where you are mirroring a site exposed via a web
> server generated directory index.
> wget won't download all files, only all html pages (which is good, just
> clarifying your answer).
> The user knows (believes) that all existing files are reachable there,
> so the deleting could make sense.
> Moreover, the "download all pages to follow its links" fails here, since
> wget will download the same index many times, sorted by every field, even
> when
> the user knows they are redundant and --reject-ed them.
>  If wget were made to parse the local files when it
>> realizes it doesn't have to re-download them, then that would help a
>> lot, but it doesn't currently, and trying to make it do so has some
>> potential problems (though it still might be worth it).
> Could be a nice addition, not sure what problems are you thinking about.
> Pages with odd timestamps?
>  Such a feature might be better for FTP, which does impart sufficient
>> knowledge to Wget, but the patch you linked doesn't provide that.
> That patch only deletes files which wget try to download (reachable/on
> a url list). It can be useful on some cases, but it's  a functionaliy too
> specific. I'd go for find-like options --exec <command> ; /
> --exec-if-status <n> <command> ; so that the user could delete the missing
> files, but also move them to an archive, AV-check files on download, etc.
>  Anyway, Wget currently lacks a maintainer (as of January), so I'm afraid
>> no one's going to add new features until that changes. I'm the former
>> maintainer, and am occasionally willing to apply easy bugfix patches,
>> but nothing beyond that.
> I think it will take some time to get a new maintainer.

