lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Recursive dump


From: Klaus Weide
Subject: Re: lynx-dev Recursive dump
Date: Fri, 14 Jan 2000 10:31:23 -0600 (CST)

On Fri, 14 Jan 2000, Vieri Di_paola wrote:

> Can Lynx dump recursively all the pages of a specified site? For instance,
> can I download recursively all the web pages of www.lynx.browser.org and
> exclude any other sites like www.whereever.com? 

No tool can automatically dump "all the pages of a specified site".  They
have have to be linked (or known in some other way beforehand).  To use
your example, I don't know what "all the web pages of www.lynx.browser.org"
are.  <http://www.lynx.browser.org/> certainly doesn't link to any of them.
So you have to reduce your expectations.

> I do not wish to use wget for this task 
  ^^^^^^^^^^^^^^^^^^^^^^^^^

But you should.  Wget, or one of a number of other programs meant
specifically for this kind of thing.  The current main Lynx help page
has links to a couple more.  (Here is a copy:
<http://sol.slcc.edu/lynx/current/lynx2-8-3/lynx_help/lynx_help_main.html >)


> (is wget capable of excluding sites?).

It seems you haven't really looked at it, but have already decided that
you don't want to use it.  Wget comes with man and info pages.

> I know that the parameter -localhost disables URLs that point to remote
> hosts. How can I fill in the following command line in order to do the
> recursive job?
>   lynx -dump -source -localhost http://www.lynx.browser.org

You can't, it doesn't make sense.  "www.lynx.browser.org" is a remote
host, unless perhaps you are Rob Partington and logged in on that machine
and running lynx there.

Well, there is a "-startfile_ok" flag, but that is to exempt the startfile
from some other restrictions, not from this one.

> Or should I use -realm which restricts access to URLs in the starting
> realm? What's a realm (excuse my ignorance)?

If the initial URL is "http://www.example.com/foo/index.htm";, then URLs
starting with "http://www.example.com/foo/"; are regarded to be in the
same realm.

If you start lynx with -trace, the trace log will have a line like
    Starting realm is 'http://www.example.com/foo/'.
 
> Should I use -traversal and -crawl?

If you have to ask, probably no.  If you automatically want to "get
the same directory structure", no.

> If what I asked is possible, can I get the same directory structure as in
> the remote host?

Wget tries to do just that.  Lynx doesn't.

   Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]