bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] CNET download links not working with WGET


From: Jeff Givens
Subject: Re: [Bug-wget] CNET download links not working with WGET
Date: Wed, 08 Jun 2011 12:38:31 -0400
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10

No, it's not working. It downloads part of the URL and creates a file named address@hidden which is 68 KB. I cannot wget to treat the string of characters as a whole URL. Please help, I really need to get this script working and the only place to download this file is from CNET.

So... looks like it works, then. Your command shell isn't complaining
about weird command names, wget is clearly requesting the full and
correct URL, it follows redirections, and saves using the final
redirection URL (the latest sources wouldn't follow that last step -
it'd save using the request URI by default).

If you dislike the filename, then provided you have a recent enough
version of wget you can add the --content-disposition option if the
server provides a rename header ("Content-Disposition"); or else use -E
to have wget force the file name to end in .html

-mjc

(05/26/2011 12:19 PM), Jeff Givens wrote:
Hi, I know this is an older topic but thanks for replying.  I forgot to
mention I had already what you listed below and this is the output I get:

C:\DOWNLOAD>wget "http://dw.com.com/redir?edId=3&siteId=4&oId=3000-8022_
4-10804572&ontId=8022_4&spi=077d9109e846975d0db9532bd610588f&lop=link&tag=tdw_dl

text&ltype=dl_dlnow&pid=11665648&mfgId=6290020&merId=6290020&pguid=HFsQLwoOYJQAA

BuImQcAAAGm&destUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572.html%3

Fspi%3D077d9109e846975d0db9532bd610588f"
--2011-05-02 12:34:20--
http://dw.com.com/redir?edId=3&siteId=4&oId=3000-8022_4
-10804572&ontId=8022_4&spi=077d9109e846975d0db9532bd610588f&lop=link&tag=tdw_dlt

ext&ltype=dl_dlnow&pid=11665648&mfgId=6290020&merId=6290020&pguid=HFsQLwoOYJQAAB

uImQcAAAGm&destUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572.html%3F

spi%3D077d9109e846975d0db9532bd610588f
Resolving dw.com.com... 216.239.113.95
Connecting to dw.com.com|216.239.113.95|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location:
http://download.cnet.com/3001-8022_4-10804572.html?spi=077d9109e846975
d0db9532bd610588f [following]
--2011-05-02 12:34:21--
http://download.cnet.com/3001-8022_4-10804572.html?spi=
077d9109e846975d0db9532bd610588f
Resolving download.cnet.com... 64.30.224.58
Connecting to download.cnet.com|64.30.224.58|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to:
address@hidden'

     [<=>                                 ] 69,240      77.3K/s   in 0.9s

2011-05-02 12:34:22 (77.3 KB/s) -
address@hidden
d0db9532bd610588f.1' saved [69240]


C:\DOWNLOAD>

Thanks for your help.

-    Jeff



hello,

the "&" character in the url is interpreted by your shell.

Try using something like:

wget "URL"

Cheers,
Giuseppe



"Jeff Givens"<address@hidden>   writes:

Hello, I am having an issue downloading files via download links from
CNET.  It appears to locate some of the URL but stops at the first
&siteId part.  I have included the debug information as well.  Thanks
in advance for your help.

C:\>DOWNLOAD\wget http://dw.com.com/redir?edId=3&siteId=4&oId=300
0-8022_4-10804572&ontId=8022_4&spi=077d9109e846975d0db9532bd610588f&lop=link&tag

=tdw_dltext&ltype=dl_dlnow&pid=11665648&mfgId=6290020&merId=6290020&pguid=HFsQLw

oOYJQAABuImQcAAAGm&destUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-8022_4-10804572

.html%3Fspi%3D077d9109e846975d0db9532bd610588f
--2011-04-19 11:30:35-- http://dw.com.com/redir?edId=3
Resolving dw.com.com... 216.239.113.95
Connecting to dw.com.com|216.239.113.95|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://dw.com.com/redir/redx/?edId=3 [following]
--2011-04-19 11:30:36-- http://dw.com.com/redir/redx/?edId=3
Reusing existing connection to dw.com.com:80.
HTTP request sent, awaiting response... 404 Not Found
2011-04-19 11:30:36 ERROR 404: Not Found.

'siteId' is not recognized as an internal or external command,
operable program or batch file.
'oId' is not recognized as an internal or external command,
operable program or batch file.
'ontId' is not recognized as an internal or external command,
operable program or batch file.
'spi' is not recognized as an internal or external command,
operable program or batch file.
'lop' is not recognized as an internal or external command,
operable program or batch file.
'tag' is not recognized as an internal or external command,
operable program or batch file.
'ltype' is not recognized as an internal or external command,
operable program or batch file.
'pid' is not recognized as an internal or external command,
operable program or batch file.
'mfgId' is not recognized as an internal or external command,
operable program or batch file.
'merId' is not recognized as an internal or external command,
operable program or batch file.
'pguid' is not recognized as an internal or external command,
operable program or batch file.
'destUrl' is not recognized as an internal or external command,
operable program or batch file.

DEBUG output created by Wget 1.11.4 on Windows-MSVC.

--2011-04-19 11:27:09-- http://dw.com.com/redir?edId=3
Resolving dw.com.com... seconds 0.00, 64.30.224.42
Caching dw.com.com =>   64.30.224.42
Connecting to dw.com.com|64.30.224.42|:80... seconds 0.00, connected.
Created socket 340.
Releasing 0x01411158 (new refcount 1).

---request begin---
GET /redir?edId=3 HTTP/1.0

User-Agent: Wget/1.11.4

Accept: */*

Host: dw.com.com

Connection: Keep-Alive



---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 302 Found

Date: Tue, 19 Apr 2011 15:27:26 GMT

Server: Apache/2.0

Pragma: no-cache

Cache-control: no-cache, must-revalidate, no-transform

Vary: *

Expires: Fri, 23 Jan 1970 12:12:12 GMT

Set-Cookie: XCLGFbrowser=Cg5iVk2tqd6JAAAA8Sg; expires=Sun, 18-Apr-2021
15:27:26 GMT; domain=.com.com; path=/

Location: http://dw.com.com/redir/redx/?edId=3

Content-Length: 0

P3P: CP="CAO DSP COR CURa ADMa DEVa PSAa PSDa IVAi IVDi CONi OUR OTRi
IND PHY ONL UNI FIN COM NAV INT DEM STA"

Keep-Alive: timeout=363, max=760

Connection: Keep-Alive

Content-Type: text/plain



---response end---
302 Found
Registered socket 340 for persistent reuse.
cdm: 1 2 3 4 5 6 7 8
Stored cookie com.com -1 (ANY) /<permanent>   <insecure>   [expiry
2021-04-18 11:27:26] XCLGFbrowser Cg5iVk2tqd6JAAAA8Sg
Location: http://dw.com.com/redir/redx/?edId=3 [following]
Skipping 0 bytes of body: [] done.
--2011-04-19 11:27:09-- http://dw.com.com/redir/redx/?edId=3
Reusing existing connection to dw.com.com:80.
Reusing fd 340.

---request begin---
GET /redir/redx/?edId=3 HTTP/1.0

User-Agent: Wget/1.11.4

Accept: */*

Host: dw.com.com

Connection: Keep-Alive

Cookie: XCLGFbrowser=Cg5iVk2tqd6JAAAA8Sg



---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 404 Not Found

Date: Tue, 19 Apr 2011 15:27:26 GMT

Server: Apache/2.0

Content-Length: 209

Keep-Alive: timeout=363, max=779

Connection: Keep-Alive

Content-Type: text/html; charset=iso-8859-1



---response end---
404 Not Found
Skipping 209 bytes of body: [<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML
2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /redir/redx/ was not found on this server.</p>
</body></html>
] done.
2011-04-19 11:27:09 ERROR 404: Not Found.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]