[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lynx-dev] Lynx as a HTML to text converter.
From: |
Thomas Dickey |
Subject: |
Re: [Lynx-dev] Lynx as a HTML to text converter. |
Date: |
Tue, 2 Aug 2005 20:33:08 -0400 (EDT) |
On Tue, 2 Aug 2005, Jari Tuominen wrote:
Hi
I am programming a Web crawler and an indexer.
I am implementing Lynx in converting HTML documents into text files, by using
command "lynx -dump".
The problem is that it converts relative URLs to FILE:///db/www/... -stylish.
yes - because the document you pointed it at is that type of URL.
True, lynx is interpreting the URLs, but it's not changing their type.
I am using Lynx in extracting links out of the HTML files, so I need to play
around alot to convert those local URLs back to relative ones, which I can
combine to the host name, therefore creating an absolute www- URL.
If you know any other program than Lynx which does these similar tasks at
same performance, I would be interested to know, thanks...
I'm not sure if you'll find one (sorry).
--
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net