bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [PATCH] Make wget capable of starting download from a spe


From: Darshit Shah
Subject: Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.
Date: Sat, 21 Dec 2013 13:51:04 +0530

I have a few comments on the patch. Commenting inline.

On Sat, Dec 21, 2013 at 12:32 PM, Yousong Zhou <address@hidden>wrote:

> This patch adds an option `--start-pos' for specifying starting position
> of a download, both for HTTP and FTP.  When specified, the newly added
> option would override `--continue'.  Apart from that, no existing code
> should be affected.
>
> Signed-off-by: Yousong Zhou <address@hidden>
> ---
> Hi,
>
> I found myself needed this feature when I was trying to tunnel the
> download of
> big file (several gigabytes) from a remote machine back to local through a
> somewhat flaky connection.  It's a pain both for the server and local
> network
> users if we have to repeat the previously already downloaded part in case
> that
> the connection hangs or breaks.  Specifying 'Range: ' header is not an
> option
> for wget (integrity check in the code would fail), and curl is not fast
> enough.
> So I decided to make this patch in hope that this can also be useful to
> someone
> else.
>
> What integrity check would fail on using the Range Header? And if you
already have a partially downloaded file why is using the --continue switch
on an option?

        yousong
>
>  doc/ChangeLog |    4 ++++
>  doc/wget.texi |   14 ++++++++++++++
>  src/ChangeLog |    9 +++++++++
>  src/ftp.c     |    2 ++
>  src/http.c    |    2 ++
>  src/init.c    |    1 +
>  src/main.c    |    1 +
>  src/options.h |    1 +
>  8 files changed, 34 insertions(+), 0 deletions(-)
>
> diff --git a/doc/ChangeLog b/doc/ChangeLog
> index 3b05756..df103c8 100644
> --- a/doc/ChangeLog
> +++ b/doc/ChangeLog
> @@ -1,3 +1,7 @@
> +2013-12-21  Yousong Zhou  <address@hidden>
> +
> +       * wget.texi: Add documentation for --start-pos.
> +
>  2013-10-06  Tim Ruehsen  <address@hidden>
>
>         * wget.texi: add/explain quoting of wildcard patterns
> diff --git a/doc/wget.texi b/doc/wget.texi
> index 4a1f7f1..166ea08 100644
> --- a/doc/wget.texi
> +++ b/doc/wget.texi
> @@ -701,6 +701,20 @@ Another instance where you'll get a garbled file if
> you try to use
>  Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
>  servers that support the @code{Range} header.
>
> address@hidden offset
> address@hidden continue retrieval
> address@hidden incomplete downloads
> address@hidden resume download
> address@hidden start position
> address@hidden address@hidden
> +Start the download at position @var{OFFSET}.  Offset may be expressed in
> bytes,
> +kilobytes with the `k' suffix, or megabytes with the `m' suffix.
> +
> +When specified, it would override the behavior of @samp{--continue}.  When
> +using this option, you may also want to explicitly specify an output
> filename
> +with @samp{-O FILE} in order to not overwrite an existing partially
> downloaded
> +file.
> +
>  @cindex progress indicator
>  @cindex dot style
>  @item address@hidden
> diff --git a/src/ChangeLog b/src/ChangeLog
> index 42ce3e4..ab8a496 100644
> --- a/src/ChangeLog
> +++ b/src/ChangeLog
> @@ -1,3 +1,12 @@
> +2013-12-21  Yousong Zhou  <address@hidden>
> +
> +       * options.h: Add option --start-pos to specify start position of
> +         a download.
> +       * main.c: Same purpose as above.
> +       * init.c: Same purpose as above.
> +       * http.c: Utilize opt.start_pos for HTTP download.
> +       * ftp.c: Utilize opt.start_pos for FTP retrieval.
> +
>  2013-11-02  Giuseppe Scrivano  <address@hidden>
>
>         * http.c (gethttp): Increase max header value length to 512.
> diff --git a/src/ftp.c b/src/ftp.c
> index c2522ca..c7ab6ef 100644
> --- a/src/ftp.c
> +++ b/src/ftp.c
> @@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo
> *f, ccon *con, char **local_fi
>        /* Decide whether or not to restart.  */
>        if (con->cmd & DO_LIST)
>          restval = 0;
> +      else if (opt.start_pos)
> +        restval = opt.start_pos;
>        else if (opt.always_rest
>            && stat (locf, &st) == 0
>            && S_ISREG (st.st_mode))
> diff --git a/src/http.c b/src/http.c
> index 754b7ec..a354c6b 100644
> --- a/src/http.c
> +++ b/src/http.c
> @@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file
> exists.\n"));
>        /* Decide whether or not to restart.  */
>        if (force_full_retrieve)
>          hstat.restval = hstat.len;
> +      else if (opt.start_pos)
> +        hstat.restval = opt.start_pos;
>        else if (opt.always_rest
>            && got_name
>            && stat (hstat.local_file, &st) == 0
> diff --git a/src/init.c b/src/init.c
> index 84ae654..7f7a34e 100644
> --- a/src/init.c
> +++ b/src/init.c
> @@ -271,6 +271,7 @@ static const struct {
>    { "showalldnsentries", &opt.show_all_dns_entries, cmd_boolean },
>    { "spanhosts",        &opt.spanhost,          cmd_boolean },
>    { "spider",           &opt.spider,            cmd_boolean },
> +  { "startpos",         &opt.start_pos,         cmd_bytes },
>    { "strictcomments",   &opt.strict_comments,   cmd_boolean },
>    { "timeout",          NULL,                   cmd_spec_timeout },
>    { "timestamping",     &opt.timestamping,      cmd_boolean },
> diff --git a/src/main.c b/src/main.c
> index 19d7253..4fbfaee 100644
> --- a/src/main.c
> +++ b/src/main.c
> @@ -281,6 +281,7 @@ static struct cmdline_option option_data[] =
>      { "server-response", 'S', OPT_BOOLEAN, "serverresponse", -1 },
>      { "span-hosts", 'H', OPT_BOOLEAN, "spanhosts", -1 },
>      { "spider", 0, OPT_BOOLEAN, "spider", -1 },
> +    { "start-pos", 0, OPT_VALUE, "startpos", -1 },
>      { "strict-comments", 0, OPT_BOOLEAN, "strictcomments", -1 },
>      { "timeout", 'T', OPT_VALUE, "timeout", -1 },
>      { "timestamping", 'N', OPT_BOOLEAN, "timestamping", -1 },
> diff --git a/src/options.h b/src/options.h
> index ad89627..9ba1760 100644
> --- a/src/options.h
> +++ b/src/options.h
> @@ -115,6 +115,7 @@ struct options
>    bool ask_passwd;              /* Ask for password? */
>
>    bool always_rest;            /* Always use REST. */
> +  wgint start_pos;             /* Start position of a download. */
>    char *ftp_user;              /* FTP username */
>    char *ftp_passwd;            /* FTP password */
>    bool netrc;                  /* Whether to read .netrc. */


Simply setting the startpos and restval values in Wget will not help you if
the server will keep sending the file from scratch on every request. The
only way to partially download a file is by using the Range header. Because
then, the server will send only the requested parts of the file. If a
server does not support the range header, you really have no other option
but need to resume downloading the file from scratch. This is a limitation
of the HTTP protocol and no client can help you get around it.


Also, please send patches as git format-patch attachments. Email clients
often mangle with spaces and newlines and cause problems when applying them.

-- 
Thanking You,
Darshit Shah


reply via email to

[Prev in Thread] Current Thread [Next in Thread]