bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Re: Thoughts on regex support


From: Matthew Woehlke
Subject: [Bug-wget] Re: Thoughts on regex support
Date: Fri, 25 Sep 2009 12:43:25 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.23) Gecko/20090825 Fedora/2.0.0.23-1.fc10 Thunderbird/2.0.0.23 Mnenhy/0.7.5.0

Tony Lewis wrote:
Micah Cowan wrote:
Tony Lewis wrote:
Given that the most common use case is to match against suffixes in the
path, perhaps ':path/i:^.*\.' and '$' should be implied so that --traverse
'(html?|php)' is interpreted as ':path/i:^.*\.(html?|php)$'.
Again, I really want consistency with the regex rules.

OK. So how about adding :suffix: to the mix. Then one can say --traverse
':suffix/i:(html?|php)'.

I don't think this will work very well. What is the suffix of 'vacation_plans.odt.bak'?

In all the places that I work with regular expressions, anchors are
explicitly specified so *I* would be most surprised by having implicit
anchors.

find(1) :-). But that's the /only/ example of explicit anchoring I can think of (and actually, Micah pointed it out, I don't know that I have ever used regex with find).

What about the possibility of including multiple components in the same
argument to match?

If we do that, better to just implement full Boolean logic IMO. Of course I think PCRE's allow toggling case sensitivity for parts of the regex, which would solve this.

Um... if we require PCRE, we might not need flags at all. And we can drop them safely, because the syntax was such that they could still be re-added later.

In your proposal am I allowed to supply two --match parameters that are
OR'ed together?

URL's are accepted iff:
[ANY match evaluates true] AND NOT [ANY no-match evaluates true]

--
Matthew
Please do not quote my e-mail address unobfuscated in message bodies.
--
I want to vote for a Conservative Democrat. Too bad they're about as rare as an Honest Politician. Maybe I'll get lucky and someone will come along that's both.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]