bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Problem, no getting any response


From: Dan Yamins
Subject: Re: [Bug-wget] Problem, no getting any response
Date: Sun, 22 Nov 2009 09:29:52 -0500

Hi Tony, thanks for your email.


> {"searchQueryString":"p+9-n+12-c+287464-s+0-r+-t+-ri+-ni+1-x+"}
>
>
Sorry, this was the value of "searchQueryString" for a different attempt --
a somewhat different page.   However, the result in both cases is the same
-- no response.



> Other things that might matter to the server:
> - the user agent (many servers reject web crawling software such as wget)
> - the content type (Firefox is sending application/json)
> - referer
> - cookies
>
>
Yes, I tried putting in all the various headers using the '--header='
syntax, including user agent, content-type, referer, cookies, &c.   No
matter which headers I include, I get the same result (except if I don't
include "Content-length:84', in which case I get a error: 500").

Here's he wget command with all the headers set (for a slightly different
page, but same type):

wget --debug --header="User-Agent:Mozilla/5.0 (Macintosh; U; Intel Mac OS X
10.5; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5"
--header="Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
--header="Accept-Language: en-us,en;q=0.5" --header= "Accept-Encoding:
gzip,deflate" --header="Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"
--header="Keep-Alive: 300" --header="Connection: keep-alive"
--header="Content-Type: application/json; charset=utf-8" --header="Referer:
http://www.tiffany.com/Shopping/CategoryBrowse.aspx?cid=287466&mcat=148204";
--header="Cookie: assortmentid=101; hascookies1=1;
__utma=124393999.990367556.1258838771.1258848628.1258899757.5;
__utmz=124393999.1258838771.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
s_vi=[CS]v1|25842D7985010E69-4000010E8017E5DD[CE]; _UrlReferrer==http%3A//
www.tiffany.com/Shopping/CategoryBrowse.aspx%3Fcid%3D287466%26mcat%3D148204%23p+1-n+12-cg+viewPaged-c+287466-s+5-r+101323339+101424823+101323353-t+-ri+-ni+1-x+-pu+-f+;
s_sq=tiffanyrus%3D%2526pid%253DTiffany%252520%252526%252520Co.%252520%25257C%252520Browse%252520Rings%2526pidt%253D1%2526oid%253Djavascript%25253AgotoPage%2525281%252529%25253B%2526ot%253DA;
s_cc=true; __utmc=124393999; previoussid=; samebrowsersession=; ShowFlash=;
__utmb=124393999.7.7.1258899766577; siteid=1; SiteIDForModules=1"
--header="Pragma: no-cache" --header="Cache-Control: no-cache"
--header="Content-length:84"
--post-data="searchQueryString=p+2-n+12-c+287466-s+5-r+-t+-ri+-ni+1-x+&isSearchMode=false"
http://www.tiffany.com/Shopping/CategoryBrowse.aspx/GetCategoriesXmlBySearchQS-O
test.html


... and I still get no response.




> Good luck.
> -----Original Message-----
> From: address@hidden
> [mailto:bug-wget-bounces+wget <bug-wget-bounces%2Bwget>=exelana.com@
> gnu.org] On Behalf Of Dan Yamins
> Sent: Saturday, November 21, 2009 3:53 PM
> To: address@hidden
> Subject: [Bug-wget] Problem, no getting any response
>
> Hi,
>
> I'm trying to use wget to scrape some data from a page that requires a
> posting of some data (the page itself does it via Javascript).   When I use
> the command:
>
> $ wget --header="Content-length:84"
>
> --post-data="searchQueryString=p-8-n+12-cg+viewPaged-c+287464-s+0-r+-t+-ri+-
> ni+1-x+-pu+-f+"
>
> http://www.tiffany.com/Shopping/CategoryBrowse.aspx/GetCategoriesXmlBySearch
> QS-O
> test.html
>
> .... I never get a response and wget hangs.
>
> My question is, even though I'm sending the exact same post as the browser
> does when I view the page in Firefox (I looked at it in firebug), I guess I
> must not be sending something right.  I've tried mimicking everything in
> the
> request header, but no matter what, I always get the hang.
>
> Is there something else I can do?  Something obvious I'm doing wrong?  (Am
> I
> not posting the xml properly?)
>
> Thanks!
> Dan
>
>
>
> --- Here is the request, as reported by Firebug:
>
> {"searchQueryString":"p+9-n+12-c+287464-s+0-r+-t+-ri+-ni+1-x+"}
>
> --- Full request headers as reported by Firebug:
> Host: www.tiffany.com
> User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US;
> rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: en-us,en;q=0.5
> Accept-Encoding: gzip,deflate
> Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
> Keep-Alive: 300
> Connection: keep-alive
> Content-Type: application/json; charset=utf-8
> Referer:
> http://www.tiffany.com/Shopping/CategoryBrowse.aspx?cid=287464&mcat=148204
> Content-Length<http://www.tiffany.com/Shopping/CategoryBrowse.aspx?cid=287464&mcat=148204%0AContent-Length>:
> 84
> Cookie: assortmentid=101; hascookies1=1;
> __utma=124393999.990367556.1258838771.1258838771.1258842033.2;
> __utmc=124393999;
>
> __utmz=124393999.1258838771.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none
> );
> s_cc=true;
>
> s_sq=tiffanyrus%3D%2526pid%253DTiffany%252520%252526%252520Co.%252520%25257C
>
> %252520Browse%252520Earrings%2526pidt%253D1%2526oid%253Djavascript%25253Ahan
> dlePageRight%252528%252529%25253B%2526ot%253DA;
> s_vi=[CS]v1|25842D7985010E69-4000010E8017E5DD[CE]; samebrowsersession=;
> previoussid=; _UrlReferrer==http%3A//
>
> www.tiffany.com/Shopping/CategoryBrowse.aspx%3Fcid%3D288188%26mcat%3D148206%
> 23p+1-n+12-cg+viewPaged-c+288188-s+5-r+101287458-t+-ri+-ni+1-x+-pu+-f+<http://www.tiffany.com/Shopping/CategoryBrowse.aspx%3Fcid%3D288188%26mcat%3D148206%%0A23p+1-n+12-cg+viewPaged-c+288188-s+5-r+101287458-t+-ri+-ni+1-x+-pu+-f+>
> ;
> __utmb=124393999.54.8.1258844027232
> Pragma: no-cache
> Cache-Control: no-cache
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]