bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Regular expression matching


From: Micah Cowan
Subject: Re: [Bug-wget] Regular expression matching
Date: Thu, 05 Apr 2012 11:18:02 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120302 Thunderbird/11.0

On 04/04/2012 12:02 PM, Ángel González wrote:
> On 04/04/12 20:16, Gijs van Tulder wrote:
>> 1. You can match complete urls, instead of just the directory prefix
>> or the file name suffix (which you can do with --accept and
>> --include-directories).
>> 2. You can use regular expressions to do the matching, which is
>> sometimes easier to than using a list of wildcard patterns.
>>
>> Now this isn't a new idea (there are long discussions in the archive,
>> see [1]). But somehow the previous attempts didn't make it, so I
>> thought I'd send my own version. It's a small patch, I've been using
>> it for a while and found it really useful.
>>
>> I've made two versions of the patch: one uses PCRE, the other uses the
>> gnulib regex library, which is probably easier to integrate.
>>
>> Regards,
>>
>> Gijs
> I really like PCRE, but I think the default should be POSIX regex (those
> you called "gnulib regex library"), just as every other command lines
> tool, such as sed or  grep. There could be a --perl-regexp switch to
> change it (which could take advantage of the posix interface of pcre).

sed and grep's default regexes (BREs) are next-to-useless, and are only
used by default for historical compatibility. They manage to be useful
much of the time for grep, but sed is greatly hampered - it's somewhat
improved by GNU extensions to the POSIX BRE syntax, which become hard to
live without when you try to use sed on a system that lacks them. EREs
(what you get with "grep -E" or "egrep") are a big step up, and are
still POSIX, so that's what I'd recommend as a default.

OTOH, PCREs are completely compatible with well-formed EREs, in which
case there seems little harm in letting those be default. But it would
be nice to fall back onto POSIX regexes when PCRE is not found.

There should be information added to the --version output, to declare
the presence of regex support, and what types are supported.

Also, I think regex type selection should be with something like
--regex-type=pcre, rather than something like --perl-regexp, to allow
for easy expansion if need be.

-mjc



reply via email to

[Prev in Thread] Current Thread [Next in Thread]