lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV SUGG: automatic bookmarks-url-checker + LYNX-TRICK


From: Filip M Gieszczykiewicz
Subject: Re: LYNX-DEV SUGG: automatic bookmarks-url-checker + LYNX-TRICK
Date: Sun, 17 Nov 1996 11:00:44 -0600 (CST)

You (David Combs) wrote:
> My bookmark file is some 450 items long.  Many of the url's
> have been changed on me by now -- and I don't know it until
> I actually TRY the item.
> SUGGESTION: something that simply loops through all my
> bookmark-items, and for those whose url's no longer
> exist, APPEND "DEAD-URL" or something to the
> "comment part" that comes after the url.

Greetings. Nah... why reinvent the wheel? I have over 5000 entries in
my sub-bookmarks :-) and I use MOMspider to check them. In fact, I
use MOMspider to also check my web site every few months. Here's
the document:

http://www.ics.uci.edu/WebSoft/MOMspider/

and it needs both perl 4.036 and the libwww ... or parts of thereof.
If you want, you can download a copy that has all the needed *.pl
files from libwww and is pretty much ready to configure and run
from

http://www.paranoia.com/~filipg/TEMP/MOMspider-1.00.tar.gz
(must specify WHOLE URL - can't get it otherwise)

The features (why I like it) are:

* it's civil in its searches (bandwidth-wise)
* it reports ALL errors, including time-outs and where they
  happened.
* it returns "last modified" date for all external sites
* it generates a really cool and useful report file
* you can limit access to known sites, instruct it to avoid whole
  trees of data, and limit the "depth" of perusing...

And a bunch more that I can't remember! Really cool program..
up there on my top 10 list... right below lynx :-)

-----------------

LYNX-TRICK:

I use this little shell/awk script to generate an index of my
sub-bookmarks directory - basically, go through each sub-bookmark
and add a link to it to the InDex file. Pretty simple.

-----------------chop-with-axe----------------chop-with-axe---------------------
#!/bin/sh
#
# PD. address@hidden http://www.paranoia.com/~filipg/
# Index.sh generates index.html which is sucked dry by MOMspider
# [smirk]
#
DIR=`pwd`
OUT=InDex.html
#
# Add header to the output file
#
cat << EOP > $OUT
<html><head><title>Sub-Bookmarks Index</title></head>
<body><h1>CONTENTS:</h1>
EOP
#
# Now go through every sub-bookmark and add a link to it to
# the InDex file..
#
ls .[A-z]*.html *.html | awk \
'BEGIN {print "<ul>";dir=ARGV[1];gsub(/a=/,"",dir);}\
{printf("<li><a href=\"%s/%s\">%s</a>\n",dir,$1,$1);}\
END {print "</ul>"}' a="$DIR" >> $OUT
#
# Add footer to the output file
#
echo "</body></html>" >> $OUT
#
echo "done!"
-----------------chop-with-axe----------------chop-with-axe---------------------

Just run it in the directory where the sub-bookmarks are (or wherever
you want to generate an index of all *.html files, really) and lynx the
file "InDex.html" (it's not "index" so we don't clobber an existing file)

THEN, you can set MOMspider on that index file and it will check all
your links for you. I usually set the "MaxDepth" to 5 because all I
care is that the destination exists... and don't want the default
20 layers. You can fine-tune it more.

WARNING: this is not a (thank god) real-time application. Set up,
test it, and background it and come back tomorrow! Don't try to
speed it up or we'll all hate you. I run it:
"nice ./momspider >&/tmp/MOMspider.$$&"
and come back a day or two later (if I run it on the whole site).

WARNING: the generated stats file may be easily over 1MB in size..
be careful that you have that much space for it (or more). That is
why it's often better to run it in "<Tree" mode rather than "<Site".
You'll see what I mean. That and lower "MaxDepth".

Take care.
;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;


reply via email to

[Prev in Thread] Current Thread [Next in Thread]