bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [PATCH] New option: --rename-output: modify output filena


From: Andrew Cady
Subject: Re: [Bug-wget] [PATCH] New option: --rename-output: modify output filename with perl
Date: Fri, 2 Aug 2013 03:13:27 -0400
User-agent: Mutt/1.5.20 (2009-06-14)

On Thu, Aug 01, 2013 at 09:57:40PM +0200, Dagobert Michelsen wrote:
> Hi Andrew,

Hi Dagobert.

I have to say, I find your comments perplexing.  But first let me state
for the record that PCRE is already a dependency of wget.  I didn't add
it.

> Am 01.08.2013 um 21:22 schrieb Andrew Cady <address@hidden>:
>
> > With sed, you still need -u, or else there is a deadlock.  This
> > knowledge should be embedded into wget because most people don't
> > have it.
>
> You are talking about GNU sed, please keep in mind that wget is
> portable to systems without or just a subset of the GNU userland.

Yes, I know.  But those other sed implementations will probably not
work.  They will just deadlock.

However you can always compile GNU sed, and adjust PATH to ensure that
the GNU sed you compiled will be called.  Even without root access.

> From a packagers perspective this is a nightmare because this feature
> introduces weak dependencies to programs which need to be in PATH.  If
> PATH includes only tools shipped by the system it may be necessary to
> explicitly set the path for each of the tools to have the switches you
> use (for example /usr/bin/sed on Solaris does not have -u) or disable
> them during configure time when the tools are not available. The exact
> locations need to adjustable during configure for each of the tools
> you use.

You have actually made a generic argument against the entire idea of
calling external programs.  I don't find your argument convincing.  Lots
of programs make external calls to programs like 'ssh' or 'less' (e.g.:
git, rsync).  GNU Emacs comes prepared to call dozens, if not hundreds,
of external programs, at the user's prompting.  So does Vim.  GNU Bash
will also call many external programs in order to implement command
completion.  No "nightmare" results.  "Weak dependencies" are just not
a problem.  The calling program only has to handle the absence of the
called program gracefully.

It seems you are thinking that the --name-filter=sed should be disabled
if there was no sed on the system, but there is no need for that.  You
save about 20 bytes of memory, but then you have to recompile wget after
you install sed.  There's no point.  It doesn't benefit the user.  It's
not like a shared library that needs to be linked.  If sed is missing,
that's OK.  You just can't use it.

> All this added complexity seems highly overengineered for a feature
> that is not in the core functionality of the tool and that only a
> fraction of the users use. Keep in mind: a good tool is one that does
> a single job right.

This is where I get really confused.  Wget and only wget can decide
where wget will write files.  There is no way that this can be done in
another program -- especially considering the -k switch, which will
rewrite the internal links in the HTML.  It is definitely in the core
functionality of the tool.  I just cannot fathom how you would think it
is not.  (I don't mean to sound rude; I am genuinely confused.)

The only alternative is to completely rewrite the entire downloaded tree
in a second pass over the data on disk.  Is that what you have in mind
as the better approach?

I could give several reasons why that approach is inferior.  But for
brevity I will just mention one that is not the most obvious:

Wget does not only write its output and then forget about it.  Instead,
(with certain options in place), it will look at its own output files
from previous runs to decide what needs to be downloaded.  So, even
aside from not-insignificant resource utilization issues, any "two-pass"
solution will interfere with the ability to incrementally mirror sites
using wget.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]