[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] mirroring from webpage: looking for --delete

From: Mojca Miklavec
Subject: [Bug-wget] mirroring from webpage: looking for --delete
Date: Thu, 16 Dec 2010 14:01:50 +0100

Dear list,

I'm a frequent user of rsync for mirroring content. The program
provides a switch "--delete" which removes all the files that are not
present on the server (or source location) any more.

But the webpage I want to sync from only provides access to data via
http (not even ftp). It is not a proper website, it is just a site
with index.html (and possibly some subfolders that also contain
index.html). I started using
    wget -np --mirror --progress=bar -nH --cut-dirs=1 -erobots=off
--reject="index.html*" $SERVER
however there is one problem ... whenever the server removes some
files, they will be left in my folder. My main question is: how can I
remove those files automatically (apart from rewriting wget in perl or
ruby to suit my needs)?

One option could be to simply remove everything from my local
directory and fetch all the files over and over again, but I would
like to avoid unnecessary traffic, esp. because the files originate
from Japan and the bandwidth is very low.

Another option would be to loop over all the files on my local drive
after wget finishes the job and check if they still exist on the
remote server; and delete them if they don't. But that is all extra
work that could probably easily be handled if I would find the right
option in wget to delete those files instead of me.

This list might not be the most appropriate place to ask this
question, but I didn't know where else I could ask.

Thank you very much,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]