bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [bug #48424] wget fails to convert some URLs when the same fi


From: anonymous
Subject: [Bug-wget] [bug #48424] wget fails to convert some URLs when the same file path is retrieved via more than one protocol
Date: Wed, 6 Jul 2016 19:27:25 +0000 (UTC)
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/601.6.17 (KHTML, like Gecko) Version/9.1.1 Safari/601.6.17

URL:
  <http://savannah.gnu.org/bugs/?48424>

                 Summary: wget fails to convert some URLs when the same file
path is retrieved via more than one protocol
                 Project: GNU Wget
            Submitted by: None
            Submitted on: Wed 06 Jul 2016 07:27:23 PM UTC
                Category: Program Logic
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: Paul Merchant
        Originator Email: address@hidden
             Open/Closed: Open
         Discussion Lock: Any
                 Release: 1.18
        Operating System: Mac OS
         Reproducibility: Every Time
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None

    _______________________________________________________

Details:

If different protocols retrieve the same file path in a recursive crawl that
converts URLs, some of the referring URLs will not be rewritten. For example,
if file A, in directory wget-test contains these links:

<a href="http://myhost/wget-test/b.html";>b - http</a>
<a href="https://myhost/wget-test/b.html";>b - https</a>

Then a.html retrieved by this command:
wget -m --convert-links http://myhost/wget-test/a.html

will contain

<a href="http://myhost/wget-test/b.html";>b - http</a>
<a href="b.html">b - https</a>

Since the different protocols actually refer to different servers (or ports on
the same server that may not sharedirectory aliases), there is no guarantee
that the matching url paths actually represent the same file.  Ideally wget
should separate paths by protocol, and offer an option to ignore the protocol
when making paths so that if http and https (or ftp, or...) are known to
correspond to the same directory this can be reflected in the URL conversion.

As wget works now, the no clobber flag cannot be used as a work-around as it
is incompatible with the recursive crawl.




    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?48424>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]