Re: [Bug-wget] feature request : wget --delete-missing

From: Keisial
Subject: Re: [Bug-wget] feature request : wget --delete-missing
Date: Sat, 06 Mar 2010 00:22:12 +0100
Micah Cowan wrote:
The page you linked to said (in Japanese) that it's foolish to have to
rm -r the directory before each mirror attempt with wget. I don't really
agree: since wget's going to have to download each and every file
_anyway_, just in order to ensure that it finds all the available links,
it hardly seems useful to leave the previous files around, just so they
get overwritten.
I have to disagree with you, Micah.
Consider the case where you are mirroring a site exposed via a web
server generated directory index.
wget won't download all files, only all html pages (which is good, just
clarifying your answer).
The user knows (believes) that all existing files are reachable there,
so the deleting could make sense.
Moreover, the "download all pages to follow its links" fails here, since
wget will download the same index many times, sorted by every field, even when
the user knows they are redundant and --reject-ed them.

If wget were made to parse the local files when it
realizes it doesn't have to re-download them, then that would help a
lot, but it doesn't currently, and trying to make it do so has some
potential problems (though it still might be worth it).
Could be a nice addition, not sure what problems are you thinking about.
Pages with odd timestamps?

Such a feature might be better for FTP, which does impart sufficient
knowledge to Wget, but the patch you linked doesn't provide that.
That patch only deletes files which wget try to download (reachable/on
a url list). It can be useful on some cases, but it's  a functionaliy too
specific. I'd go for find-like options --exec <command> ; /
--exec-if-status <n> <command> ; so that the user could delete the missing
files, but also move them to an archive, AV-check files on download, etc.

Anyway, Wget currently lacks a maintainer (as of January), so I'm afraid
no one's going to add new features until that changes. I'm the former
maintainer, and am occasionally willing to apply easy bugfix patches,
but nothing beyond that.
I think it will take some time to get a new maintainer.

