bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Bug-wget Digest, Vol 99, Issue 10: regarding wget not con


From: Kun Zhou
Subject: Re: [Bug-wget] Bug-wget Digest, Vol 99, Issue 10: regarding wget not converting links correctly
Date: Tue, 31 Jan 2017 02:28:46 +0000
User-agent: NylasMailer/17.1.6

Hello,

  

I am replying to this mailing list regarding to the second issue: wget not
converting links correctly. I installed alpha release of wget , 1.18.109-4734,
on Arch Linux. When I run `wget -H -r -k -l 1
econ.ucsb.edu/~tedb/Courses/GraduateTheoryUCSB/TheoryF

16.html`, an excerpt of the output from wget to the terminal is

  

_\--2017-01-30 21:19:03--
http://econ.ucsb.edu/~tedb/Courses/GraduateTheoryUCSB/Bernoulli.pdf _

_ Reusing existing connection to www.econ.ucsb.edu:80.  
HTTP request sent, awaiting response... 200 OK  
Cookie coming from econ.ucsb.edu attempted to set domain to
faculty.econ.ucsb.edu  
Length: 479295 (468K) [application/pdf]  
www.econ.ucsb.edu/~tedb/Courses/GraduateTheoryUCSB: Not a
directorywww.econ.ucsb.edu/~tedb/Courses/GraduateTheoryUCSB/Bernoulli.pdf: Not
a directory  
  
Cannot write to
‘www.econ.ucsb.edu/~tedb/Courses/GraduateTheoryUCSB/Bernoulli.pdf’ (Not a
directory)._

_  
_

I can confirm that `Bernoulli.pdf` is still not downloaded and a number of
links not converted.

Another relevant issue is that the host name `econ.ucsb.edu` and
`www.econ.ucsb.edu` resolves to the same ip address, verified by the `dig`
command on linux. However, wget fail to detect this fact and list list the two
host names seperately, maybe this is a bug, or maybe just a feature. I have
attached the complete wget output as a textfile in case it is useful.

  

Kun

  
On Jan 30 2017, at 12:00 pm, address@hidden wrote:  

> Send Bug-wget mailing list submissions to  
address@hidden

>

> To subscribe or unsubscribe via the World Wide Web, visit  
https://lists.gnu.org/mailman/listinfo/bug-wget  
or, via email, send a message with subject or body 'help' to  
address@hidden

>

> You can reach the person managing the list at  
address@hidden

>

> When replying, please edit your Subject line so it is more specific  
than "Re: Contents of Bug-wget digest..."

>

>  
Today's Topics:

>

>    1. [PATCH] Add support for --retry-http503 (Tom Szilagyi)  
   2. [bug #50173] wget not converting links and downloading  
      correctly (Tim Ruehsen)  
   3. Please help by replying it (Budi Kusasi)
>

>  
\----------------------------------------------------------------------

>

> Message: 1  
Date: Sun, 29 Jan 2017 22:20:04 +0100  
From: Tom Szilagyi <address@hidden>  
To: address@hidden  
Subject: [Bug-wget] [PATCH] Add support for --retry-http503  
Message-ID: <address@hidden>  
Content-Type: text/plain; charset=us-ascii

>

> Consider HTTP 503 (Service unavailable) as a non-fatal, transient  
error. Normally Wget gives up immediately on receiving this HTTP  
response. Certain special use cases might require Wget to retry even  
in the face of this error. With this option, such retries are  
performed subject to the normal retry timing and retry count  
limitations of Wget. Using this option is generally not recommended.

>

> Tested manually by pointing wget to <http://httpbin.org/status/503>  
with and without supplying the --retry-http503 option.

>

> This takes care of <http://savannah.gnu.org/bugs/?20417>  
\---  
 doc/wget.texi | 8 ++++++++  
 src/http.c | 5 +++++  
 src/init.c | 1 +  
 src/main.c | 1 +  
 src/options.h | 1 +  
 5 files changed, 16 insertions(+)

>

> diff --git a/doc/wget.texi b/doc/wget.texi  
index f42773e..6700dc2 100644  
\--- a/doc/wget.texi  
+++ b/doc/wget.texi  
@@ -1716,6 +1716,14 @@ some few obscure servers, which never send HTTP
authentication  
 challenges, but accept unsolicited auth info, say, in addition to  
 form-based authentication.  
  
address@hidden --retry-http503  
+Consider HTTP 503 (Service unavailable) as a non-fatal, transient  
+error. Normally Wget gives up immediately on receiving this HTTP  
+response. Certain special use cases might require Wget to retry even  
+in the face of this error. With this option, such retries are  
+performed subject to the normal retry timing and retry count  
+limitations of Wget. Using this option is generally not recommended.  
+  
address@hidden table  
  
address@hidden HTTPS (SSL/TLS) Options, FTP Options, HTTP Options, Invoking  
diff --git a/src/http.c b/src/http.c  
index 9f03d86..0a32126 100644  
\--- a/src/http.c  
+++ b/src/http.c  
@@ -4319,6 +4319,11 @@ http_loop (const struct url *u, struct url
*original_url, char **newloc,  
               logprintf (LOG_NOTQUIET, _("\  
 Remote file does not exist -- broken link!!!\n"));  
             }  
\+ else if (opt.retry_http503 &&  
\+ hstat.statcode == HTTP_STATUS_UNAVAILABLE)  
\+ {  
\+ continue;  
\+ }  
           else  
             {  
               logprintf (LOG_NOTQUIET, _("%s ERROR %d: %s.\n"),  
diff --git a/src/init.c b/src/init.c  
index 271bc77..bbb6473 100644  
\--- a/src/init.c  
+++ b/src/init.c  
@@ -304,6 +304,7 @@ static const struct {  
   { "restrictfilenames", NULL, cmd_spec_restrict_file_names },  
   { "retrsymlinks", &opt.retr_symlinks, cmd_boolean },  
   { "retryconnrefused", &opt.retry_connrefused, cmd_boolean },  
\+ { "retryhttp503", &opt.retry_http503, cmd_boolean },  
   { "robots", &opt.use_robots, cmd_boolean },  
   { "savecookies", &opt.cookies_output, cmd_file },  
   { "saveheaders", &opt.save_headers, cmd_boolean },  
diff --git a/src/main.c b/src/main.c  
index e393597..b160a9f 100644  
\--- a/src/main.c  
+++ b/src/main.c  
@@ -404,6 +404,7 @@ static struct cmdline_option option_data[] =  
     { "restrict-file-names", 0, OPT_BOOLEAN, "restrictfilenames", -1 },  
     { "retr-symlinks", 0, OPT_BOOLEAN, "retrsymlinks", -1 },  
     { "retry-connrefused", 0, OPT_BOOLEAN, "retryconnrefused", -1 },  
\+ { "retry-http503", 0, OPT_BOOLEAN, "retryhttp503", -1 },  
     { "save-cookies", 0, OPT_VALUE, "savecookies", -1 },  
     { "save-headers", 0, OPT_BOOLEAN, "saveheaders", -1 },  
     { IF_SSL ("secure-protocol"), 0, OPT_VALUE, "secureprotocol", -1 },  
diff --git a/src/options.h b/src/options.h  
index d713acc..d42dc30 100644  
\--- a/src/options.h  
+++ b/src/options.h  
@@ -43,6 +43,7 @@ struct options  
   bool quiet; /* Are we quiet? */  
   int ntry; /* Number of tries per URL */  
   bool retry_connrefused; /* Treat CONNREFUSED as non-fatal. */  
\+ bool retry_http503; /* Treat HTTP 503 errors as non-fatal. */  
   bool background; /* Whether we should work in background. */  
   bool ignore_length; /* Do we heed content-length at all? */  
   bool recursive; /* Are we recursive? */  
\--  
2.1.4

>

>  
\------------------------------

>

> Message: 2  
Date: Mon, 30 Jan 2017 11:39:23 +0000 (UTC)  
From: Tim Ruehsen <address@hidden>  
To: Tim Ruehsen <address@hidden>, address@hidden,  
address@hidden, address@hidden, address@hidden  
Subject: [Bug-wget] [bug #50173] wget not converting links and  
downloading correctly  
Message-ID: <address@hidden>  
Content-Type: text/plain;charset=UTF-8

>

> Update of bug #50173 (project wget):

>

>            Fixed Release: None => trunk

>

>     _______________________________________________________

>

> Follow-up Comment #1:

>

> I can reproduce the described behavior with wget 1.18.

>

> But it seems to be fixed in trunk (= alpha release 1.19).

>

> BlumeSimonCh21.pdf appears two times, one as relative URL (being loaded) and  
as absolute URL to a different host (not being loaded).

>

> Bernoulli.pdf appears only once as absolute URL to a different host (not
being  
loaded).

>

>  
    _______________________________________________________

>

> Reply to this item at:

>

>   <http://savannah.gnu.org/bugs/?50173>

>

> _______________________________________________  
  Message sent via/by Savannah  
  http://savannah.gnu.org/

>

>  
\------------------------------

>

> Message: 3  
Date: Mon, 30 Jan 2017 19:55:01 +0700  
From: Budi Kusasi <address@hidden>  
To: address@hidden  
Subject: [Bug-wget] Please help by replying it  
Message-ID:  
<address@hidden>  
Content-Type: text/plain; charset=UTF-8

>

> What are error codes returned by wget in windows contexts in clear  
explanation

>

>  
\------------------------------

>

> Subject: Digest Footer

>

> _______________________________________________  
Bug-wget mailing list  
address@hidden  
https://lists.gnu.org/mailman/listinfo/bug-wget

>

>  
\------------------------------

>

> End of Bug-wget Digest, Vol 99, Issue 10  
****************************************

Attachment: log_downloadIssues
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]