lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Traversal Probelm part2


From: Philip Webb
Subject: Re: lynx-dev Traversal Probelm part2
Date: Sat, 19 Feb 2000 18:11:42 -0500

000216 Jeff Crane wrote:
> My real goal is to traverse ALL of  http://dir.yahoo.com
> I dont need actualy pages, just organizational trees.

this risks making you unpopular with the site management,
if you are taking up an undue amount of their computer time.

>  lynx -error_file=lynx.err -anonymous -cookies
>  -traverse -cache=999999 -localhost
>  http://dir.yahoo.com/Arts/ > /dev/null &
> the recursion runs into a link that it will not recognize as visited
> (infintely visiting it & adding it to traverse.dat & traverse2.dat ,
> over and over and over again). As you can imagine,
> the process then runs out of control eating more and more %CPU and RAM
> (first it fills up RAM with the cache, then starts sucking %CPU).

no-one else has replied, but the tendency in the past has been
to deprecate use of Lynx for this type of mass downloading,
which can cause administrators to blackball other Lynx users' access.
 
> Is there is a way to store this kind of information
> (directory structure of dir.yahoo.com as individual paths in a text file)
> using  wget  or if there is a lynx flag I'm forgetting; perhaps a lynx bug?

the other usual reaction is to recommend  wget .

-- 
========================,,============================================
SUPPORT     ___________//___,  Philip Webb : address@hidden
ELECTRIC   /] [] [] [] [] []|  Centre for Urban & Community Studies
TRANSIT    `-O----------O---'  University of Toronto

reply via email to

[Prev in Thread] Current Thread [Next in Thread]