bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [bug #46611] log errors with --trust-server-names


From: Tim Ruehsen
Subject: [Bug-wget] [bug #46611] log errors with --trust-server-names
Date: Wed, 16 Mar 2016 11:20:13 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0

Follow-up Comment #6, bug #46611 (project wget):

The reason why the wrong '.html' extension is sometimes mentioned and
sometimes now is that there are different servers that we become redirected
to.

One of the servers answers with
HTTP/1.1 304 Not Modified
Date: Wed, 16 Mar 2016 10:55:31 GMT
Accept-Ranges: bytes
ETag: "1444344092"
Content-Type: application/octet-stream
X-HW: 1458125731.dop012.fr7.t,1458125731.cds062.fr7.c
Content-Disposition: attachment; filename="mbam-setup-2.2.0.1024.exe"

The second server answers (here we see the .html later in the logs):
HTTP/1.1 304 Not Modified
Accept-Ranges: bytes
Cache-Control: max-age=86400, public
Date: Wed, 16 Mar 2016 10:37:45 GMT
Etag: "482eb1-15d8fd8-5219f871809fa"
Expires: Thu, 17 Mar 2016 10:37:45 GMT
Last-Modified: Thu, 08 Oct 2015 22:38:53 GMT
Server: ECAcc (fcn/9FA9)
X-Cache: HIT

The Content-Type: header field is missing, which leaves Wget with the default
for HTTP. And I guess Wget's default Content-Type is text/html.

RFC 2616 says:
Any HTTP/1.1 message containing an entity-body SHOULD include a Content-Type
header field defining the media type of that body. If and only if the media
type is not given by a Content-Type field, the recipient MAY attempt to guess
the media type via inspection of its content and/or the name extension(s) of
the URI used to identify the resource. If the media type remains unknown, the
recipient SHOULD treat it as type "application/octet-stream".

The updated RFC 7231 3.1.1.5. says:
A sender that generates a message containing a payload body SHOULD
   generate a Content-Type header field in that message unless the
   intended media type of the enclosed representation is unknown to the
   sender.  If a Content-Type header field is not present, the recipient
   MAY either assume a media type of "application/octet-stream"
   ([RFC2046], Section 4.5.1) or examine the data to determine its type.

   In practice, resource owners do not always properly configure their
   origin server to provide the correct Content-Type for a given
   representation, with the result that some clients will examine a
   payload's content and override the specified type.  Clients that do
   so risk drawing incorrect conclusions, which might expose additional
   security risks (e.g., "privilege escalation").  Furthermore, it is
   impossible to determine the sender's intent by examining the data
   format: many data formats match multiple media types that differ only
   in processing semantics.  Implementers are encouraged to provide a
   means of disabling such "content sniffing" when it is used.


Before we are going to change Wget's default content-type to
application/octet-stream, I would like to hear some voices. There might be a
good reason for the current behavior of Wget.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?46611>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]