lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV URL grabber for lynx


From: Axel C. Frinke
Subject: Re: LYNX-DEV URL grabber for lynx
Date: Sun, 12 Oct 1997 23:38:27 +0200 (MET DST)

                                        Roesberg, Sun  12.10.97
David,

A>> I've just written a small C program to filter URLs from stdin, sort
D>
D> Wouldn't it have been easier to borrow the routine from wget (Gnu sites)
D> that does this (assuming that you mean filter them from an HTML source)?

Yes, if wgets sorts the URLs in a useful way.
(Sorry, I've never heard about wgets before.)
At meanwhile, I've enhanced my program to filter URLs from files
already generated by 'lynx -dump'.

A>> batch or at) to retrieve all found URLs using "lynx -dump" (or
D> For -dump, and with recursion allowed, Lynx already does this.

You mean options "-crawl" or "-traversal"?
Anyway, I was not satisfied with them. URLs others than 'text/html'
and URLs from a different host were not retrieved.
Well, I go for that. After all, lynx must be able to terminate. ;-)

However, a program generating a script to retrieve URLs provides
ability to select URLs before retrieving.

D> wget has options to insert delays between each fetch to avoid overloading
D> the server and also respects robots.txt to avoid fetching private or
D> dynamic information.

OK, I will take a look for wgets.   
But then, I've already done the effort.
So, if there no more concerns against publishing a URL grabber,
inspite of the existence of wgets I will upload it on the web
(at http://titan.informatik.uni-bonn.de/~frinke/UrlGrab.zip).

Regards,

Axel.

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]