bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] New to this, large files constraints?


From: Jochen Roderburg
Subject: Re: [Bug-wget] New to this, large files constraints?
Date: Sat, 17 Sep 2011 14:09:54 +0200
User-agent: Internet Messaging Program (IMP) H3 (4.3.7)

Zitat von Jochen Roderburg <address@hidden>:

Zitat von Jochen Roderburg <address@hidden>:


This is really an "interesting" problem:

http://socds.huduser.org/permits/output_monthly_csv.odb?outpref=csv&geoval=state&datatype=monthlyF&varlist=1%232%233&yearlist=2000%232001%232002%232003%232004%232005%232006%232007%232008%232009%232010&statelist=13%2337%2345&msalist=+&cbsalist=+&bppllist=+&cntylist=13033%2313073%2313189%2313245%2337007%2337025%2337071%2337119%2337179%2345001%2345003%2345005%2345007%2345009%2345011%2345013%2345015%2345017%2345019%2345021%2345023%2345025%2345027%2345029%2345031%2345033%2345035%2345037%2345039%2345041%2345043%2345045%2345047%2345049%2345051%2345053%2345055%2345057%2345059%2345061%2345063%2345067%2345069%2345065%2345071%2345073%2345075%2345077%2345079%2345081%2345083%2345085%2345087%2345089%2345091&COUNTYSUM=YES&COUNTYALL=+&COUNTYGRP=+&STATESUM=+&STATEALL=+&METROSUM=+&METROALL=+&METRO=+&CBSA=+&PLACEGRP=+&CSUMNAME=&JSUMNAME=+&geo=state&chron=monthlyF

On Windows you may see older versions of wget give the error message
"Result too large" but it means filename too long. In Linux "File name too
long". And wget 1.13 --trust-server-names doesn't work with this site's
response.. should it?

Well, in theory it should work with "--content-disposition=on", as the webapplication sends a Content-Disposition header with a filename:

---response begin---
HTTP/1.1 200 OK
Content-Type: application/vnd.ms-excel
Server: Microsoft-IIS/6.0
Content-Disposition: attachment; filename=BuildingPermits.csv;
X-Powered-By: ASP.NET
Date: Sat, 17 Sep 2011 05:58:06 GMT
Connection: close

---response end---

... but wget seems to bail out with the overlong filename *before* it reads the response headers.


After further examination I must retract the "before" assumption.

Debug outputs show the GET response headers with Content-Disposition and the error message comes after it, so it looks more as if for some unknown reason the Content-Disposition is simply ignored.

Sorry for the noise, as often the whole truth is more complicated and one has to test very carefully to avoid all side-effects.

New result: it works fine as expected with wget default options and --content-disposition=on

It does not work, however, with the additional option --timestamping (makes no sense of course for this type of dynamically generated output, but I have it as my default and somehow it seems to have also crept into my tests, although I tried to avoid it ;-).

FWIW, in this case I see the following sequence in the debug output:

wget does a HEAD request first and gets a "standard" response *without* Content-Disposition.
Then it makes a GET and gets the Content-Disposition.
And in this situation it seems to ignore this.

Best regards,
Jochen Roderburg






reply via email to

[Prev in Thread] Current Thread [Next in Thread]