bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [bug #55771] content-disposition not respected with long URLs


From: anonymous
Subject: [Bug-wget] [bug #55771] content-disposition not respected with long URLs (such as AWS)
Date: Sat, 23 Feb 2019 16:57:10 -0500 (EST)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36

URL:
  <https://savannah.gnu.org/bugs/?55771>

                 Summary: content-disposition not respected with long URLs
(such as AWS)
                 Project: GNU Wget
            Submitted by: None
            Submitted on: Sat 23 Feb 2019 09:57:08 PM UTC
                Category: Program Logic
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: shawnkhall
        Originator Email: address@hidden
             Open/Closed: Open
         Discussion Lock: Any
                 Release: 1.20
        Operating System: Microsoft Windows
         Reproducibility: Every Time
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: No

    _______________________________________________________

Details:

Using Wget 1.20.

The --content-disposition option is not respected with long URLs. 

Expected: download the file to the name from the Content-Disposition header.

Result: downloaded the filename with 240 characters of garbage that does not
include the traits from Content-Disposition or the end of the URL which may
have been able to be used to correct the filename afterwards.


Issue: GitHub is more and more using AWS for backend storage and AWS uses very
long URLs with ugly parameterized output. Sadly, Wget trims the output URL to
compose the filename even when a Content-Disposition header is included and
the --content-disposition option is enabled.

This commonly results in 500+ character url-encoded filenames that are
stripped down to 400-something, but since this includes slashes and
semicolons, the resulting filename isn't valid, or is 240 characters and the
relevant filename portion to be able to correct after download has stripped
from the saved filename.

Ideally it should respect the filename exposed within the content-disposition
header in order to prevent this problem.

If that's not an option, it should perform the character removals to "correct"
the filename from the beginning of the filename and not from the end, to
preserve the extension (or at least parsable data that could be used to
identify the extension). Offering a trim option to select which side of the
URL should be stripped would be an acceptable workaround temporarily.


Example requests:

wget -N --debug -e content_disposition=on
https://github.com/pbatard/rufus/releases/download/v3.4/rufus-3.4p.exe

Results in "response1.txt"


wget -N --debug --content-disposition
https://github.com/pbatard/rufus/releases/download/v3.4/rufus-3.4p.exe

Results in "response2.txt"


When using --content-disposition, these also send a pre-test HEAD request
which can trigger duplicate downloads for servers that do not properly respond
to HEAD requests (though that's not the case here).


Exclusion of the --content-disposition parameter results in the file
downloading but not with the correct name.

wget -N --debug
https://github.com/pbatard/rufus/releases/download/v3.4/rufus-3.4p.exe

Results in "response3.txt"




    _______________________________________________________

File Attachments:


-------------------------------------------------------
Date: Sat 23 Feb 2019 09:57:08 PM UTC  Name: response1.txt  Size: 11KiB   By:
None
wget output for the three requests in the summary section
<http://savannah.gnu.org/bugs/download.php?file_id=46344>
-------------------------------------------------------
Date: Sat 23 Feb 2019 09:57:08 PM UTC  Name: response2.txt  Size: 11KiB   By:
None
wget output for the three requests in the summary section
<http://savannah.gnu.org/bugs/download.php?file_id=46345>
-------------------------------------------------------
Date: Sat 23 Feb 2019 09:57:08 PM UTC  Name: response3.txt  Size: 18KiB   By:
None
wget output for the three requests in the summary section
<http://savannah.gnu.org/bugs/download.php?file_id=46346>

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?55771>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]