lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Traversal Probelm part2


From: Klaus Weide
Subject: Re: lynx-dev Traversal Probelm part2
Date: Sun, 20 Feb 2000 04:04:01 -0600 (CST)

On Wed, 16 Feb 2000, Jeff Crane wrote:

> My real goal is to traverse ALL of
> http://dir.yahoo.com

The traversal feature of Lynx was invented so that authors can check
THEIR OWN pages.  Is was not meant to be a general-purpose robot.
It doesn't follow the robots conventions.  Which means, YOU are
responsible for following the conventions if you use lynx outside of
the intended purpose.  Including honoring robots.txt and otherwise
honoring sites' restrictions on use.

Anyway...

> I dont need actualy pages, just organizational trees.
> Today I tried-
> 
> lynx -error_file=lynx.err -anonymous -cookies
> -traverse -cache=999999 -localhost
> http://dir.yahoo.com/Arts/ > /dev/null &
> 
> and a number of variations. 

*This* variation makes little sense.
- With -localhost, lynx will exit immediately since your starting URL
  is not local.
- Of course you won't see the message in which lynx complaines about that,
  since you are redirecting to /dev/null.
- I doubt you need or want -anonymous.  It does something else than what
  you think, most likely.
- You don't have enough memory to hold 999999 rendered documents in
  memory at the same time.  Even if you had, it wouldn't make sense.

Instead, you may want the -realm flag.

> Inevitably the recursion
> runs into a link that it will not recognize as visited
> (infintely visiting it and adding it to traverse.dat
> and traverse2.dat, respectively, over and over and
> over again). As you can imagine, the process then runs
> out of control eating more and more %CPU and RAM
> (first it fills up RAM with the cache, then starts
> sucking %CPU).
> 
> I would like to know if there is a way to store this
> kind of information (the directory structure of
> dir.yahoo.com as individual paths in a text file)
> using wget or if there is a lynx flag I'm forgetting;
> perhaps it's a lynx bug?

Maybe it is.  Maybe you are running into some fixed limit, like
(from userdefs.h)
  #define MAXHIST  1024           /* max links we remember in history */
It could well be that nobody has tested what happens when that limit
(or maybe some other) gets exceeded during traversal.

You didn't tell us what version of Lynx or what OS you are using.

   Klaus



reply via email to

[Prev in Thread] Current Thread [Next in Thread]