bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] How to exclude link?


From: Peng Yu
Subject: Re: [Bug-wget] How to exclude link?
Date: Sun, 11 Jul 2010 14:42:56 -0500

On Sun, Jul 11, 2010 at 8:56 AM, Tom Mizutani <address@hidden> wrote:
> Hi,
>
> For your first argument, wget will download xyz*.html or xyz*.htm
> files, analyze them, find links from them, and delete them, regardless
> what you specify with --reject option.  That is the way wget was
> designed, and I observed that wget works as such.
>
> Wget manual online
> <http://www.gnu.org/software/wget/manual/wget.html#Types-of-Files>
> clearly says:
>
>>Note that these two options (--accept  and --reject) do not affect the 
>>downloading of html files (as determined by a `.htm' or `.html' filename 
>>prefix). This behavior may not be desirable for all users, and may be changed 
>>forfuture versions of Wget.Note that these two options do not affect the 
>>downloading of html files (as determined by a `.htm' or `.html' filename 
>>prefix). This behavior may not be desirable for all users, and may be changed 
>>for future versions of Wget.
>
> For your second argument, I have no suggestion, since the deletion of
> "xyz_want_to_download" will be the _right_ behavior.

I forget to mention that some.web.com/xyz_want_to_download doesn't
have a suffix at all and neither do all the links that matches the
patterns 'xyz*'. I don't think that the quote applies to my question.

It seems that wget is not flexible enough to allow complex filtering
rules for deciding what to download and what not to download. At this
limit, I might be better rely on some other tools like perl LWP to
customize my download? Or there is any better suggestions?

-- 
Regards,
Peng



reply via email to

[Prev in Thread] Current Thread [Next in Thread]