bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Wget can't download JPG images


From: Todd Pattist
Subject: Re: [Bug-wget] Wget can't download JPG images
Date: Tue, 02 Jun 2009 14:57:45 -0400
User-agent: Thunderbird 2.0.0.21 (Windows/20090302)

Read Micah's answer first.

My default is to use a wgetrc file that sets:
robots = off
user-agent = Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609
(I do this because some of my permanent login cookies are linked to my user agent string and I keep that string locked in wget and FireFox for testing.)

Because of my default, I didn't notice that wget as a user agent string is denied.  A non-wget user agent string seems to be necessary.  Micah says it's not necessary if you directly request the image file, but you can't get a directory listing from the  comics directory.

Try this:
wget -U Mozilla -e robots=off --no-parent -r -A jpg "http://blacktapestries.comicgenesis.com"

address@hidden wrote:
What command did you use to get the files?  I used this command:

wget --no-parent -r -A.jpg http://blacktapestries.comicgenesis.com/

And I still got the 403 Forbidden Error.

-------- Original Message --------
Subject: Re: [Bug-wget] Wget can't download JPG images
From: Todd Pattist <address@hidden>
Date: Tue, June 02, 2009 8:47 am
To: address@hidden

I'm no expert, but I tried your link, and although you can download http://blacktapestries.comicgenesis.com/comics/20020711.jpg in Firefox,  you can't actually see the directory of http://blacktapestries.comicgenesis.com/comics.

That means you can't just ask wget to start there, as you have with your command.  Wget has no way of knowing what's in that directory. This is a typical setup for a web site - the image storage directories aren't directly accessible, you have to go through the web page that links to the file in them.

Wget  works fine if you simply start at http://blacktapestries.comicgenesis.com and accept only .jpg files, with recursion on.  (I set recursion level to 3 and picked up 39 .jpg files there (and 4 in the images directory.)


address@hidden wrote:
Wget Tech Support,

I am using GNU Wget 1.11.4.3287 on Windows Vista Ultimate.

I am trying to download a web comic from a website, which is a set of JPG images.

Here is the URL of one of the image: http://blacktapestries.comicgenesis.com/comics/20020711.jpg
It's real size is 600px X 800px (85.66 KB)
Unfortunately, every time I use Wget to download it, all I get is 1px X 1px (1.09 KB) image.
It seems to be able to download JPG images from other websites, so it isn't a format issue.

When I attempt download all the web comic's pages from the parent directory, using this command:

wget --no-parent -ckr http://blacktapestries.comicgenesis.com/comics

I get this response:

SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files\GnuWin32/etc/wgetrc
--2009-06-01 22:00:27-- http://blacktapestries.comicgenesis.com/comics
Resolving blacktapestries.comicgenesis.com... 66.220.2.20
Connecting to blacktapestries.comicgenesis.com|66.220.2.20|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://blacktapestries.comicgenesis.com/comics/ [following]
--2009-06-01 22:00:27-- http://blacktapestries.comicgenesis.com/comics
Reusing existing connection to blacktapestries.comicgenesis.com:80.
HTTP request sent, awaiting response... 403 Forbidden
2009-06-01 22:00:30 ERROR 403: Forbidden.


I don't understand why this is happening because I am able to download the images manually using the Firefox internet browser.

What am I doing wrong and how do I fix it?

Thank you for you assistance,

Steven

reply via email to

[Prev in Thread] Current Thread [Next in Thread]