bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget with the -i option.


From: Ray Sterner
Subject: Re: [Bug-wget] wget with the -i option.
Date: Wed, 28 Apr 2010 12:52:51 -0400 (EDT)

  Hello Micah,

  When I use wget to grab all the files from the ftp site they download
  very quickly (relatively).  That means it's possible to do.
  I can see why making a new connection for each file in a list is a
  reasonable default, they might be scattered all over the web.
  I guess the get-all-the-files mode must use a single connection for
  everything on the target site.

  Maybe a useful new option would be one that tries to use the same
  connection for as many files as it can in the given list.  In my
  case all the files in the list would be at the same location so only one
  connection would be needed.  In general files in the list could be grouped by
  site and if the current connection had a problem with a file it would be
  closed and a new connection opened. That might greatly speed up list type
  downloads for the -i option. Such a new option might be something
  like --reuse_connection or it  might even be the default for the -i list
  option.

  Maybe that would not be too hard to add and could make a big difference in
  cases like mine.  wget seemed like the best way to grab the files I
  need.  I tried curl but it had the same issue, after two files it would
  hang.  The new option might be a useful addition to wget.

  Thanks for any help.

  Ray Sterner                     address@hidden 
  The Johns Hopkins University    North latitude 39.16 degrees.
  Applied Physics Laboratory      West longitude 76.90 degrees.
  Laurel, MD 20723-6099


On Wed, 28 Apr 2010, Micah Cowan wrote:

> Comments below.
> 
> Ray Sterner wrote:
> >   Problem using wget with the -i option
> >   -------------------------------------
> > 
> >   I don't think this is a bug but I have been trying to find a solution
> >   for some time and have not been able to.  I'm hoping there will be an
> >   option that I am overlooking or misunderstanding.
> > 
> >   The problem is that I am trying to download a set of files from an ftp
> >   site.  The files are Ocean Color related data from Goddard.  I have a
> >   small test area set up to generate test files, but the real case will
> >   have a lot more files.  New files appear on the ftp server as satellites
> >   collect the data.  When the processing system is working most of the files
> >   will have been downloaded and only new ones will be needed.
> > 
> >   I can get all the files on the site with a command like
> >         wget -rnd ftp://xyz 
> >   in about 1.5 minutes.
> > 
> >   If I put the URLs of all the files in the text file download.txt and try
> >         wget -rnd -i download.txt
> >   it gets the first two files and hangs on the third.
> >   This site only allows two connections at a time from a host, so that must
> >   be why two files are no problem.
> > 
> >   I can get all the files to download using a command like
> >         wget -rnd --timeout 2 -w 5 -i download.txt
> >   but that takes about 1.5 hours.
> > 
> >   Since getting all the files is fast I know it is possible to do.
> > 
> >   What am I missing?
> >   Can I tell wget that only two connections at a time are allowed
> >   for the ftp site?
> 
> Actually, wget only ever opens one connection to a given host at one
> time; it doesn't support "accelerated downloads" or that sort of thing.
> 
> However, I think it may currently suffer from closing and reopening the
> connection, for each individual line in the file. It may be that the
> server disallows this sort of behavior as well, or that it hasn't
> finished shutting down the first couple connections before wget starts
> the third.
> 
> I'm afraid I don't know of an obvious workaround for your problem.
> 
> -- 
> Micah J. Cowan
> http://micah.cowan.name/
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]