bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Bug-wget] Problem, no getting any response


From: Tony Lewis
Subject: RE: [Bug-wget] Problem, no getting any response
Date: Sun, 22 Nov 2009 22:06:52 -0800

You still are not setting the POST data correctly. You need something like
this:

--post-data='{"searchQueryString":"p+9-n+12-c+287464-s+0-r+-t+-ri+-ni+1-x+"}
'

 

You're taking the hard road for many of these options. Try the following
command line options instead of forcing the headers yourself:

 

--user-agent

--referer

 

You probably don't need to send the Accept-* headers. You definitely don't
need the Connection header.

 

You should let wget set the Content-Length header.

 

Including the --debug output in your report would be useful.

 

Tony

From: Dan Yamins [mailto:address@hidden 
Sent: Sunday, November 22, 2009 6:30 AM
To: Tony Lewis
Cc: address@hidden
Subject: Re: [Bug-wget] Problem, no getting any response

 


Hi Tony, thanks for your email.

 

{"searchQueryString":"p+9-n+12-c+287464-s+0-r+-t+-ri+-ni+1-x+"}


Sorry, this was the value of "searchQueryString" for a different attempt --
a somewhat different page.   However, the result in both cases is the same
-- no response.    

 

Other things that might matter to the server:
- the user agent (many servers reject web crawling software such as wget)
- the content type (Firefox is sending application/json)
- referer
- cookies


Yes, I tried putting in all the various headers using the '--header='
syntax, including user agent, content-type, referer, cookies, &c.   No
matter which headers I include, I get the same result (except if I don't
include "Content-length:84', in which case I get a error: 500").  

Here's he wget command with all the headers set (for a slightly different
page, but same type):

wget --debug --header="User-Agent:Mozilla/5.0 (Macintosh; U; Intel Mac OS X
10.5; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5"
--header="Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q
=0.8" --header="Accept-Language: en-us,en;q=0.5" --header= "Accept-Encoding:
gzip,deflate" --header="Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"
--header="Keep-Alive: 300" --header="Connection: keep-alive"
--header="Content-Type: application/json; charset=utf-8" --header="Referer:
http://www.tiffany.com/Shopping/CategoryBrowse.aspx?cid=287466
<http://www.tiffany.com/Shopping/CategoryBrowse.aspx?cid=287466&mcat=148204>
&mcat=148204" --header="Cookie: assortmentid=101; hascookies1=1;
__utma=124393999.990367556.1258838771.1258848628.1258899757.5;
__utmz=124393999.1258838771.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none
); s_vi=[CS]v1|25842D7985010E69-4000010E8017E5DD[CE];
_UrlReferrer==http%3A//www.tiffany.com/Shopping/CategoryBrowse.aspx%3Fcid%3D
287466%26mcat%3D148204%23p+1-n+12-cg+viewPaged-c+287466-s+5-r+101323339+1014
24823+101323353-t+-ri+-ni+1-x+-pu+-f+;
s_sq=tiffanyrus%3D%2526pid%253DTiffany%252520%252526%252520Co.%252520%25257C
%252520Browse%252520Rings%2526pidt%253D1%2526oid%253Djavascript%25253AgotoPa
ge%2525281%252529%25253B%2526ot%253DA; s_cc=true; __utmc=124393999;
previoussid=; samebrowsersession=; ShowFlash=;
__utmb=124393999.7.7.1258899766577; siteid=1; SiteIDForModules=1"
--header="Pragma: no-cache" --header="Cache-Control: no-cache"
--header="Content-length:84"
--post-data="searchQueryString=p+2-n+12-c+287466-s+5-r+-t+-ri+-ni+1-x+&isSea
rchMode=false"
http://www.tiffany.com/Shopping/CategoryBrowse.aspx/GetCategoriesXmlBySearch
QS -O test.html


... and I still get no response.  





Good luck.

-----Original Message-----
From: address@hidden
[mailto:bug-wget-bounces+wget <mailto:bug-wget-bounces%2Bwget>
address@hidden On Behalf Of Dan Yamins
Sent: Saturday, November 21, 2009 3:53 PM
To: address@hidden
Subject: [Bug-wget] Problem, no getting any response

Hi,

I'm trying to use wget to scrape some data from a page that requires a
posting of some data (the page itself does it via Javascript).   When I use
the command:

$ wget --header="Content-length:84"
--post-data="searchQueryString=p-8-n+12-cg+viewPaged-c+287464-s+0-r+-t+-ri+-
ni+1-x+-pu+-f+"
http://www.tiffany.com/Shopping/CategoryBrowse.aspx/GetCategoriesXmlBySearch

QS-O

test.html

.... I never get a response and wget hangs.

My question is, even though I'm sending the exact same post as the browser
does when I view the page in Firefox (I looked at it in firebug), I guess I
must not be sending something right.  I've tried mimicking everything in the
request header, but no matter what, I always get the hang.

Is there something else I can do?  Something obvious I'm doing wrong?  (Am I
not posting the xml properly?)

Thanks!
Dan



--- Here is the request, as reported by Firebug:

{"searchQueryString":"p+9-n+12-c+287464-s+0-r+-t+-ri+-ni+1-x+"}

--- Full request headers as reported by Firebug:
Host: www.tiffany.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US;
rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Referer:
http://www.tiffany.com/Shopping/CategoryBrowse.aspx?cid=287464
<http://www.tiffany.com/Shopping/CategoryBrowse.aspx?cid=287464&mcat=148204%
0AContent-Length> &mcat=148204
Content-Length: 84
Cookie: assortmentid=101; hascookies1=1;
__utma=124393999.990367556.1258838771.1258838771.1258842033.2;
__utmc=124393999;
__utmz=124393999.1258838771.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none
);
s_cc=true;
s_sq=tiffanyrus%3D%2526pid%253DTiffany%252520%252526%252520Co.%252520%25257C
%252520Browse%252520Earrings%2526pidt%253D1%2526oid%253Djavascript%25253Ahan
dlePageRight%252528%252529%25253B%2526ot%253DA;
s_vi=[CS]v1|25842D7985010E69-4000010E8017E5DD[CE]; samebrowsersession=;
previoussid=; _UrlReferrer==http%3A//
www.tiffany.com/Shopping/CategoryBrowse.aspx%3Fcid%3D288188%26mcat%3D148206%
<http://www.tiffany.com/Shopping/CategoryBrowse.aspx%3Fcid%3D288188%26mcat%3
D148206%25%0A23p+1-n+12-cg+viewPaged-c+288188-s+5-r+101287458-t+-ri+-ni+1-x+
-pu+-f+> 
23p+1-n+12-cg+viewPaged-c+288188-s+5-r+101287458-t+-ri+-ni+1-x+-pu+-f+;
__utmb=124393999.54.8.1258844027232
Pragma: no-cache
Cache-Control: no-cache

 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]