[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] [bug #55771] content-disposition not respected with long URLs
From: |
anonymous |
Subject: |
[Bug-wget] [bug #55771] content-disposition not respected with long URLs (such as AWS) |
Date: |
Sat, 23 Feb 2019 16:57:10 -0500 (EST) |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36 |
URL:
<https://savannah.gnu.org/bugs/?55771>
Summary: content-disposition not respected with long URLs
(such as AWS)
Project: GNU Wget
Submitted by: None
Submitted on: Sat 23 Feb 2019 09:57:08 PM UTC
Category: Program Logic
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name: shawnkhall
Originator Email: address@hidden
Open/Closed: Open
Discussion Lock: Any
Release: 1.20
Operating System: Microsoft Windows
Reproducibility: Every Time
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: No
_______________________________________________________
Details:
Using Wget 1.20.
The --content-disposition option is not respected with long URLs.
Expected: download the file to the name from the Content-Disposition header.
Result: downloaded the filename with 240 characters of garbage that does not
include the traits from Content-Disposition or the end of the URL which may
have been able to be used to correct the filename afterwards.
Issue: GitHub is more and more using AWS for backend storage and AWS uses very
long URLs with ugly parameterized output. Sadly, Wget trims the output URL to
compose the filename even when a Content-Disposition header is included and
the --content-disposition option is enabled.
This commonly results in 500+ character url-encoded filenames that are
stripped down to 400-something, but since this includes slashes and
semicolons, the resulting filename isn't valid, or is 240 characters and the
relevant filename portion to be able to correct after download has stripped
from the saved filename.
Ideally it should respect the filename exposed within the content-disposition
header in order to prevent this problem.
If that's not an option, it should perform the character removals to "correct"
the filename from the beginning of the filename and not from the end, to
preserve the extension (or at least parsable data that could be used to
identify the extension). Offering a trim option to select which side of the
URL should be stripped would be an acceptable workaround temporarily.
Example requests:
wget -N --debug -e content_disposition=on
https://github.com/pbatard/rufus/releases/download/v3.4/rufus-3.4p.exe
Results in "response1.txt"
wget -N --debug --content-disposition
https://github.com/pbatard/rufus/releases/download/v3.4/rufus-3.4p.exe
Results in "response2.txt"
When using --content-disposition, these also send a pre-test HEAD request
which can trigger duplicate downloads for servers that do not properly respond
to HEAD requests (though that's not the case here).
Exclusion of the --content-disposition parameter results in the file
downloading but not with the correct name.
wget -N --debug
https://github.com/pbatard/rufus/releases/download/v3.4/rufus-3.4p.exe
Results in "response3.txt"
_______________________________________________________
File Attachments:
-------------------------------------------------------
Date: Sat 23 Feb 2019 09:57:08 PM UTC Name: response1.txt Size: 11KiB By:
None
wget output for the three requests in the summary section
<http://savannah.gnu.org/bugs/download.php?file_id=46344>
-------------------------------------------------------
Date: Sat 23 Feb 2019 09:57:08 PM UTC Name: response2.txt Size: 11KiB By:
None
wget output for the three requests in the summary section
<http://savannah.gnu.org/bugs/download.php?file_id=46345>
-------------------------------------------------------
Date: Sat 23 Feb 2019 09:57:08 PM UTC Name: response3.txt Size: 18KiB By:
None
wget output for the three requests in the summary section
<http://savannah.gnu.org/bugs/download.php?file_id=46346>
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?55771>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Bug-wget] [bug #55771] content-disposition not respected with long URLs (such as AWS),
anonymous <=