[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [PATCH] New option: --rename-output: modify output filena

From: Andrew Cady
Subject: Re: [Bug-wget] [PATCH] New option: --rename-output: modify output filename with perl
Date: Wed, 31 Jul 2013 18:32:18 -0400
User-agent: Mutt/1.5.20 (2009-06-14)

On Wed, Jul 31, 2013 at 02:26:52PM +0200, Giuseppe Scrivano wrote:
> Hi Andrew,
> Andrew Cady <address@hidden> writes:
> >   * It includes "/usr/include/unistd.h" instead of <unistd.h>.
> >   Otherwise I get compiler warnings about implicitly defined functions.
> >   I guess the -I flags need to be changed when make calls gcc
> >   perlfilter.c?  Please advise.
> does it happen if you use "wget.h" as first include in the .c file?

Nope!  That fixed it.  Thanks.

> >   * It should be possible to override the program used to filter the
> >   names (arguably, the default should be sed rather than perl, although
> >   I don't think so).
> this seems to be the most important part to define now, it should be
> generic enough to not depend from Perl.
> What do you think about the "git filter-branch" syntax?  The difference
> is that we will use the process stdout to retrieve the destination
> filename.

By that, I assume you mean to execute the option in the shell.  So the
existing usage:


would (almost) become:

  --rename-output='perl -lpe "BEGIN{\$|++}" -e s/x/y/'

Notice the shell-escaping of $ and the fact that single quotes are
unavailable to use within the script for the second level of escaping.
It would be the same with awk, or anything but pure shell script.  I
find this kind of double-escaping extremely unpleasant.

So the next alternative to consider is to specify the program to launch
like xargs does: separate arguments on the command line correspond to
separate arguments to the final exec:

Then the current perl filter could (almost) be specified like this:

  --rename-output perl -lpe 'BEGIN{$|++}' -e s/x/y/ --

Note the '--' ending the option.

I say almost, though, because the existing code actually uses
null-termination instead of newline-termination.  (You can safely output
filenames with newlines using it!)

The disadvantages both of these alternatives have in common is needing
to specify the perl options to make it behave like sed, and the
fact that null-termination is lost.  I would much rather embed this
information into wget so that users do not have to understand that much
perl to use it.  The same thing applies to other programs that might be
used, too: wget should know some things about them so that users don't
have to.

Thus I think Tim's suggestion is the best one so far: there can be a
list of supported transformers, one of which can be perl.  One of them
could be shell, which could allow arbitrary other programs.

Then the current perl filter would be:


But this would also work:


And this:

  --rename-output=sh:'sed s/x/y/'


  --rename-output=sh:'while read name; do printf "%s\n" ${name/x/y}; done'

And I guess something like this would make sense for completeness:

  --rename-output=sh0:'while read -rd $'"'\0'"' name; do printf "%s\0" 
${name/x/y}; done'

That would pave the way for another option, useful on platforms without


On the other hand, maybe separate options should be used, instead of
parsing out the colon:

  --name-filter=sed   --rename='sed code here'
  --name-filter=perl  --rename='perl code here'
  --name-filter=shell --rename='shell code here'
  --name-filter=pcre  --rename='regex here'

Then a third option:

  --name-filter-delimiter='\0'  (or 'null')
  --name-filter-delimiter='\n'  (or 'newline')

Same idea really; more verbose, but more consistent.  Probably better
option names could be devised.

Either way, I could implement the first three filters, then pcre could
be done later.  And probably pcre should actually be default, since it
will work on every platform.  What do you think?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]