bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] retrieval failure:Forbidden? for UTF-8-URL in wget that w


From: Tim Rühsen
Subject: Re: [Bug-wget] retrieval failure:Forbidden? for UTF-8-URL in wget that works on FF and IE
Date: Wed, 08 Jun 2016 21:49:21 +0200
User-agent: KMail/4.14.10 (Linux/4.5.0-2-amd64; KDE/4.14.20; x86_64; ; )

On Wednesday 08 June 2016 11:47:46 L. A. Walsh wrote:
> I tried:
> 
> wget "http://translate.google.com/#ja/en/クイーンズブレイド・メインテーマB";
> 
> But get a an Error "403: Forbidden" (tried w/ and w/o proxy) -- same.
> 
> But cut/paste the same URL into IE11 or
> PaleMoon (a 64-bit FF derivative), and it works.
> 
> Any idea why or what I might do to get it to work?

Basically, from '#' on (fragment part of URL) nothing is relevant for the HTTP 
request. This is what Firefox 46 sends to localhost:8080 (I started a netcat 
'nc -l -p 8080' to make sure).

GET / HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:46.0) Gecko/20100101 
Firefox/46.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive

As you can see, the UTF-8 part is not relevant.


If I do a 'telnet translate.google.com 80' and paste the above (just with 
'Host: translate.google.com' and an empty line at the end):
GET / HTTP/1.1
Host: translate.google.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:46.0) Gecko/20100101 
Firefox/46.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive

The answer is
HTTP/1.1 302 Found
Location: https://translate.google.com/
Date: Wed, 08 Jun 2016 19:41:14 GMT
Expires: Wed, 08 Jun 2016 19:41:14 GMT
Cache-Control: private, max-age=0
Content-Type: text/html; charset=UTF-8
Content-Language: en
P3P: CP="This is not a P3P policy! See 
https://www.google.com/support/accounts/answer/151657?hl=en for more info."
X-Content-Type-Options: nosniff
Server: HTTP server (unknown)
Content-Length: 226
X-XSS-Protection: 1; mode=block
Set-Cookie: NID=79=RWJmTifLbUTlUm1FaGgoWgqajLS--
KpLfeevl5RaKlUp12ntFF3rfOBKvQiElhElP4CYe-5I2gZRYFJEytinX6ATW93FbhmdotpBNbWl8_aOg7AyUTnF57P8rDA0HgTL;
 
expires=Thu, 08-Dec-2016 19:41:14 GMT; path=/; domain=.google.com; HttpOnly

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="https://translate.google.com/";>here</A>.
</BODY></HTML>


Now, trying with wget -d <your URL from above>':
GET / HTTP/1.1
User-Agent: Wget/1.17.1.42-42cc8 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: translate.google.com
Connection: Keep-Alive

The answer is
HTTP/1.1 403 Forbidden
Content-Type: text/html; charset=UTF-8
X-Content-Type-Options: nosniff
Date: Wed, 08 Jun 2016 19:34:21 GMT
Server: HTTP server (unknown)
Cache-Control: private
X-XSS-Protection: 1; mode=block
Accept-Ranges: none
Vary: Accept-Encoding
Transfer-Encoding: chunked

<body skipped>


My guess is, that google does not like User-Agent 'wget', now trying with 
Firefox's User-Agent:
$ wget -d -U "Mozilla/5.0 (X11; Linux x86_64; rv:46.0) Gecko/20100101 
Firefox/46.0" http://translate.google.com

And zack ... that works. Give it a try.

Regards, Tim

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]