lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Lynx-dev] Lynx as a filter?


From: Charlie Sorsby
Subject: [Lynx-dev] Lynx as a filter?
Date: Fri, 4 Aug 2006 14:17:54 -0600 (MDT)

I should like to be able to use lynx as a filter to translate HTML
to plain text.  That is to say, I should like to be able to send
HTML to lynx's stdin and have plain text (no HTML) appear at lynx's
stdout.

I am able to make this translation with lynx but only if the input
is given as the name of a file containing the HTML on the command
line:

/usr/local/bin/lynx -dump -force_html file.html

Before I continue:
PC% lynx --version
Lynx Version 2.8.5rel.1 (04 Feb 2004)
Built on freebsd4.11 Jan  4 2005 04:58:20

According to the lynx man page it should be possible to convince
lynx to accept input via it's stdin -- unless I'm misunderstanding
what is being said:

    -  If the argument is only '-', then Lynx expects  to
       receive the arguments from stdin.  This is to allow
       for the potentially very long command line that
       can be associated with the -get_data  or -post_data
       arguments (see below).  It can also be used to
       avoid having sensitive information in the invoking
       command line (which would be visible to other
       processes on most systems), especially when the
       -auth or -pauth options are used.

But when I try something like:

PC% cat file.html | /usr/local/bin/lynx -dump -force_html -

I get something like the following:

Can't Access `file://localhost/home/crs/</HTML>'
Alert!: Unable to access document.

lynx: Can't access startfile 
PC% 

Clearly, that is only a test; in that situation, I could just use
the command line described earlier with the filename as an argument
on the command line.

For those who may be curious, I want to be able to convert HTML to
text when it is not in a file.  Specifically, I want to be able to
use vi's capability to apply an external command to a unit of text
(e.g. a paragraph or paragraphs).  I want to make a simple-minded
shell script (say in file, html2txt):

#!/bin/sh
/usr/local/bin/lynx -dump -force_html -

So that when I receive one of those damnable e-mails full of HTML,
I can run vi on the message (my mail client allows me to do that),
go to the start of the body of the message and tell vi

        !99} html2txt

and have the script run lynx on the next 99 (or fewer) paragraphs,
converting it into readable text very much as I'm able to do
something like:

        !99}fmt 55

to format text with long lines to shorter lines.

As mentioned earlier, the shell script:

#!/bin/sh
/usr/local/bin/lynx -dump -force_html $@

works fine as long as I feed that HTML to it from a file named on
the script's command line.  But that means that, instead of simply
being able to run vi on the e-mail message, moving to the start of
the HTML, and doing the "!99} html2txt" on the remainder of the
message to replace the HTML with the actual content of the message,
I must, instead, save the message to a file, delete the e-mail
headers from that file, and then run html2txt on that filename,
either saving the output to a file or piping it to a pager to read.
Very awkward.

Thanks for any help (or even for letting me know that I'm mis-
understanding the lynx man page so I can start looking elsewhere
for a solution.

Charlie
--  
Charlie Sorsby
        address@hidden
        P. O. Box 1225
        Edgewood, NM 87015
        USA

Why HTML in e-mail is evil: http://www.birdhouse.org/etc/evilmail.html
 and (possibly) how to turn it off: http://www.expita.com/nomime.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]