[Bug-wget] request for help with wget (crawling search results of a webs

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] request for help with wget (crawling search results of a webs

From:	Altug Tekin
Subject:	[Bug-wget] request for help with wget (crawling search results of a website)
Date:	Sun, 3 Nov 2013 09:13:59 +0100

Dear mailing List members,

According to the website http://www.gnu.org/software/wget/ it is ok to
write emails with help requests to this mailing list. I have the following
problem:

I am trying to crawl the search results of a news website using *wget*.

The name of the website is *www.voanews.com <http://www.voanews.com>*.

After typing in my *search keyword* and clicking search on the website, it
proceeds to the results. Then i can specify a *"to" and a "from"-date* and
hit search again.

After this the URL becomes:

http://www.voanews.com/search/?st=article&k=mykeyword&df=10%2F01%2F2013&dt=09%2F20%2F2013&ob=dt#article

and the actual content of the results is what i want to download.

To achieve this I created the following wget-command:

wget --reject=js,txt,gif,jpeg,jpg \
     --accept=html \
     --user-agent=My-Browser \
     --recursive --level=2 \
     
www.voanews.com/search/?st=article&k=germany&df=08%2F21%2F2013&dt=09%2F20%2F2013&ob=dt#article

Unfortunately, the crawler doesn't download the search results. It only
gets into the upper link bar, which contains the "Home,USA,Africa,Asia,..."
links and saves the articles they link to.

*It seems like he crawler doesn't check the search result links at all*.

*What am I doing wrong and how can I modify the wget command to download
the results search list links (and of course the sites they link to) only ?*

Thank you for any help...

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] request for help with wget (crawling search results of a website), Altug Tekin <=
- Re: [Bug-wget] request for help with wget (crawling search results of a website), Dagobert Michelsen, 2013/11/03
- Re: [Bug-wget] request for help with wget (crawling search results of a website), Tony Lewis, 2013/11/04

Prev by Date: Re: [Bug-wget] wget alpha release 1.14.96-38327
Next by Date: [Bug-wget] Suggestion
Previous by thread: [Bug-wget] Wget no output issue
Next by thread: Re: [Bug-wget] request for help with wget (crawling search results of a website)
Index(es):
- Date
- Thread