bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Re: Thoughts on regex support


From: Matthew Woehlke
Subject: [Bug-wget] Re: Thoughts on regex support
Date: Wed, 23 Sep 2009 18:07:45 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.23) Gecko/20090825 Fedora/2.0.0.23-1.fc10 Thunderbird/2.0.0.23 Mnenhy/0.7.5.0

Matthew Woehlke wrote:
Actually, it might even make sense to implement \b as only matching start/end and '[&?/.]'. That way matching path components (well, unless the paths contain '.') is also "safe".

For those watching at home, we decided on IRC this is probably a bad idea. Probably we'd get a regex engine that has \b already (maybe ERE from grep), but that it would be confusing to have it mean something different in the context of wget.

['d..q' versus 'd+p+q']

Micah voted for 'd-q'. I'm okay with this (still slightly partial to 'd..q' ;-), but 'd-q' solves the complaints I have with '+').

How, then, did you plan for 'fields' to be matched?
[also, do we allow 'd,q'?]

I voted for parsing fields as match against a list of strings (as opposed to modifying the regex, probably by 's/.*/(.*\&)?&(\&.*)?/') as the former seems safer, but this is an implementation detail. As such, the former would allow 'd,q', but this seems sufficiently esoteric that we don't feel a need to implement it unless someone has a plausible use-case and convinces Miach that an equivalent regex is too hard to write.

[implicit anchors?]

Left to Micah's discretion as far as I am concerned; the leaning is toward 'yes'. Other opinions?

Actually, this is interesting w.r.t. the first point... I don't think I would consider '--match foo' and '--no-match (?!foo)' the same. Rather, one is an accept rule (which happens to accept anything that doesn't match 'foo'), and one is a reject rule. This is actually useful since it lets you accept anything that «matches [list] AND matches [expr]».

Micah says:

<quote>
"--match foo" accepts everything that has foo in it, and isn't in the reject lists, but if it doesn't have foo, it can still be accepted by some other --match rule. --no-match '(?!foo)' instantly rejects anything that doesn't contain foo, and can't be overruled.
</quote>

(Which is to say we agree on this point.)

/Probably/ we will have a flag to invert a match (equivalent of '(?!expr)') but Micah is "not totally commited" and "[m]ight add it after the first iteration". But this also depends partly on if we have available optional PCRE.

As for '(?!)', this is not valid in ERE, so would need libpcre to be used. We could maybe use PCRE if available and only have ERE 'built in' (might need a flag to specify an expression is PCRE). Consensus on this was not reached, so this is still an open question.

--
Matthew
Please do not quote my e-mail address unobfuscated in message bodies.
--
I picked up a Magic 8-Ball the other day and it said 'Outlook not so good.' I said 'Sure, but Microsoft still ships it.'
  -- Anonymous (from cluefire.net)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]