bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Why does -A not work?


From: Nils Gerlach
Subject: Re: [Bug-wget] Why does -A not work?
Date: Wed, 20 Jun 2018 17:25:45 +0200

Hi Tim,

I am sorry but your command does not work. It only downloads the thumbnails
from the first page
and follows none of the links. Open the link in a browser. Click on the
pictures to get a larger picture.
There is a link "high quality picture" the pictures behind those links are
the ones i want to download.
Regex being ".*little-nemo.*n\l.jpeg". And not only the first page but from
the other search result pages, too.
Can you work that one out? Does this work with wget? Best result would be
if the visited html-pages were
deleted by wget. But if they stay I can delete them afterwards. But
automatism would be better, that's why I am
trying to use wget ;)

Thanks for the information on the filename and path, though.

Greetings

2018-06-20 16:13 GMT+02:00 Tim Rühsen <address@hidden>:

> Hi Nils,
>
> On 06/20/2018 06:16 AM, Nils Gerlach wrote:
> > Hi there,
> >
> > in #wget on freenode I was suggested to write this to you:
> > I tried using wget to get some images:
> > wget -nd -rH -Dcomicstriplibrary.org -A
> > "little-nemo*s.jpeg","*html*","*.html.*","*.tmp","*page*","*display*"
> -p -e
> > robots=off 'http://comicstriplibrary.org/search?search=little+nemo'
> > I wanted to download the images only but wget was not following any of
> the
> > links so I got that much more into -A. But it still does not follow the
> > links.
> > Page numbers of the search result contain "page" in the link, links to
> the
> > big pictures i want wget to download contain "display". Both are given in
> > -A and are seen in the html-document wget gets. Neither is followed by
> wget.
> >
> > Why does this not work at all? Website is public, anybody is free to
> test.
> > But this is not my website!
>
> -A / -R works only on the filename, not on the path. The docs (man page)
> is not very explicit about it.
>
> Instead try --accept-regex / --reject-regex which acts on the complete
> URL - but shell wildcard's won't work.
>
> For your example this means to replace '.' by '\.' and '*' by '.*'.
>
> To download those nemo jpegs:
> wget -d -rH -Dcomicstriplibrary.org --accept-regex
> ".*little-nemo.*n\.jpeg" -p -e robots=off
> 'http://comicstriplibrary.org/search?search=little+nemo'
> --regex-type=posix
>
> Regards, Tim
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]