lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev proposal for LYK_SCRIPT and patch


From: Klaus Weide
Subject: Re: lynx-dev proposal for LYK_SCRIPT and patch
Date: Sat, 3 Jul 1999 06:48:54 -0500 (CDT)

My observations are based on viewing the patch, not on trying to run it.
I may have misunderstood something.  If you think so, please point out
where.

You can ignore my comments if you did not indend to get this incorporated
into the dev. code.  But you called it a proposal, so...

I am only commenting on the SCRIPT part, not the mouse part.

1. This is a bad hack based on a bad hack.

   It doesn't work without SOURCE_CACHE defined at compilation AND set
   to FILE at runtime.  (It still may do something in that case, but
   that seems to be just a side effect and is not why find it useful.)

   SOURCE_CACHE isn't a real cache in the HTTP sense (no expiration or
   variant logic whatsoever), and it is incredibly wasteful: it makes
   a cache copy for each and every URL (unless it's not http, then it
   doesn't apply at all).  A lot of people can't or won't run with
   this enabled.

   Should SOURCE_CACHE be improved to be less wasteful or act more like
   a real cache, it would break (more of) the assumptions your code is
   making.  IOW adding your code (if it becomes popular) would practically
   prevent improving SOURCE_CACHE.

   SOURCE_CACHE also messes around with the way Lynx usually does things
   in weird ways, of which I am still suspicious.  Ok that may be my
   problem, but there are still bugs.

   Not that I find SOURCE_CACHE completely terrible, but it is IMO a
   stopgap partial solution that is only sometimes useful.  Its saving
   grace is that it can be completely disabled.  It's not a good
   foundation to build new features on, especially if you indend them
   to be used more widely and for a different purpose than SOURCE_CACHE
   on its own.

2. Changing a cache file???

   A cache file should contain a cached version of the original content.
   Anything else is an abuse of the idea.

   I can understand why you want to do so.  After all, the stuff is there,
   so why not mess with it if it gives you some benefit?  Very well, you
   can do so.  And maybe I would find it useful, too.  But that should be
   your or my responsibility (or that of whatever script we employ). 
   Lynx itself should not do it, or go out of its way to make it possible.

   With "going out of its way" I mean change its logic, its order
   of precessing or flow of control and data, to achieve the purpose.
   I think that is what your patch is doing.

   On the other hand, I do not object if lynx can provide the necessary
   information for this to an external program, if it doesn't affect
   the way lynx does things in a major way.  In fact might find it useful.
   That could mean provide the cache filename to an external program in
   some way, which can then do with it wht it wants.  After the external
   program exits, Lynx can reuse the cache file if it would normally
   do so.

3. Command namespace: SCRIPT is a bad choice.

   Please don't call it just SCRIPT, that is too generic and takes the
   term away from other, possibly more logical, uses.  In the context
   of a web browser, it is more likely to make one think of scripting
   languages, or maybe (LYNX)CGI scripts.  It's just yet another way to
   invoke an external program, there's no intrinsic reason why it has to
   be a 'script' than for the EXTERN key (or PRINTER, DOWNLOADER, or
   VIEWER commands, or lynxexec: and lynxprog: commands).

4. Go all the way.

   In the usage you anticipate, the command wouldn't be useful on its
   own.  You propose "that we give users the ability to change the code, 
   through a script".  If those users are not given such scripts (and
   are not able to write their own), your LYK_SCRIPT won't do them much
   good.  You might then as well put as much as possible of the logic
   into those scripts, and as little as possible into Lynx itself.

On Wed, 30 Jun 1999, Eduardo Chappa L. wrote:

> Hello 
> 
>   So here is the idea. Sometimes we find web pages where lynx is not able
> to give us a good output or is not able to, say, send a form because there
> a JavaScript button. Many of those problems arise because the person that
> wrote the html code did not take care in that lynx users would not be able
> to access their documents, and even when we complain, even a formal answer
> is hard to find, less a change. So what I propose is that we give users the
> ability to change the code, through a script. For example, if you happen to
> find a table where you find the unhappy combination
> <td><center></center></td>, this will break lynx output and the result will
> be a mess. The solution is to eliminate the <center> </center> combination,
> which could be easily done with, say sed, therefore if you can access the
> code, apply a script to it, and redisplay the result, then you can read
> better the page.
> 
>   This is possible to accomplish if we remember that we can cache the
> source, so the trick is to be able to call the script, once the page has
> been opened. This is what "LYK_SCRIPT" should do.
> 
>   The obvious advantage of this is that when users get here complaining
> that lynx does not do something, instead of telling them, yes you are
> right, wait until someone takes the painful task of writing the code, we
> can give them a script that will solve their problem, while the real
> problem of supportting javascript or fixing the output with tables is done,
> and maybe who knows which new ones will appear in the future.
> 
>   I am sending a patch against lynx2-8-2.rel1 that does this. You have to
> see it working to really be convinced that this is something useful. I have
> shown it to a couple of people that have been impressed by how much the
> output changes by pressing a few keys.

I believe that it can do something useful.  I just also believe that it
could be done in multiple other ways.

>   The way the command works is as follows:
> 
>   You access the web page that you want to go to, and realize that a
> change must be done, so you press the letter "S" to invoke a previously
> written script. You are asked a question which you will probably want to
> answer "yes" and then the script works on the /tmp file associated to the
> web page. Once the script has made its work the new /tmp file associated to
> the web page is displayed in the screen. The only bad thing I found about
> it was that this made me change a reference like "http://"; in the visited
> links to "file://localhost", which has some importance in the construction
> of the script, rather than anywhere else. I found this problem to be minor
> compared to the advantage of replacing the output ofr something nicer.

This is unclear to me; how does the Visited Links page enter into it?

>   How you set up the command to work is as follows:
> 
>   You write a script that will fix your page or make the transformation
> that you want to the html code of the page and define in the external
> section the following lines:
> 
>   EXTERNAL:http:/path/to/script %s:TRUE
>   EXTERNAL:file:/path/to/script %s:TRUE
> 
>   here %s is the URL of the currently opened web page. Additionaly the
> script receives a second parameter (is not explicitly mentioned there
> because the external command does not accept more than one parameter) which
> is the name of the /tmp file associated to the currently opened URL. In
> summary, the script has to receive two parameters "$1", the currently
> opened URL and $2 the path to the /tmp file associated to $1.

Please don't incompatibly extend an exisitng configuration option in
this way.  Adding a hidden parameter is ugly, can break existing commands,
and this kind of extension works at most once.  The fact that the EXTERNAL
command does not accept more than one parameter should tell you that
you should use a different option.

Or maybe you could add another filed for the EXTERNAL option, like
   EXTERNAL:http:/path/to/script %s:TRUE:D
(D for 'D'ocument or L for 'L'ink or 'DL' for both, for example, and
it should default to L if absent for backward compatibility.)

Reusing the existing option without change may have been easier, but
takes away choice.  Not all commands are appropriate for both uses.
You make it impossible to apply -restrictions to externals but not
to 'scripts', or the other way around.

Consider passing any additional info you want to provide to internal
programs in LYNX_something environment variables instead.  No problem
with having multiple such extensions, and a command need not worry
about additional command line arguments; what it doesn't know about
gets automatically ignored.

>   In my case the script is a menu of short scripts, from where I choose
> which one rewrites $2.
> 
>   I had also the problem that when I accessed a document that I wanted to
> apply a script to, but I pressed say the letter "o" twice, to see the
> options menu and go back, the script got applied to the /tmp file
> associated to the options menu instead of to the currently opened document
> (is this a bug or did I misunderstand something?) In any case, I solved
> this by reloading the file after pressing the second "o", this is also in
> my patch, but I guess there is a better way to do it, which I don't know.
> (This comment is valid for commands p,d,l,k,A,o.)

The way you get the cache filename from the 'ly_temp' list is bogus.
I am not surprised you sometimes get the wrong file - I am surprised
if it works most of the time.  'ly_temp' should stay a PRIVATE matter
of the code tracking temp files, other code has no business of peeking
into it.  There must be better ways to get at the current document's
cache filename!  Just start weith HTMainText and follow a few pointers.
(Note this may have changed since 2.8.2rel.1.  You should look at the
dev.N code.)

> + #define CHANGE_CACHE gettext("Do you want to replace the cached memory 
> file?")

   The prompt text is too cryptic to make sense to most users.
   Unless they understand how SOURCE_CACHE caches documents in files
   and how SCRIPT works internally (in which case they would be not far
   from being able to do it 'by hand' (similar to what Henry described)).
   It's also wrong: 'MEMORY file'?

> + { "SCRIPT",             "to make a script/program act on a web page"},
                                                               ^^^^^^^^
"Document" would be more appropriate, imo.  Since it is not limited to
what normally counts as "web pages".  OTOH it *is* limited to whatever
Lynx can display as a document.

>               if (LYValidate || check_realm) {
>                   LYPermitURL = TRUE;
>               }
> + #ifdef SOURCE_CACHE
> +             HTuncache_current_document(); 
> + #endif /* SOURCE_CACHE */
>               break;
> !         }  /* end if strncmp */
>           /*
>            *  Don't put break here so that if the backspace key
>            *  is pressed in the history page, we fall though,

Why this and all the other HTuncache's here?
I certainly don't want any unnecessary  reloading of documents in my
lynx binary, and that seems to be just what you are doing.
And having compiled with SOURCE_CACHE defined is not enough reason to
enable it.  At least it should depend on SOURCE_CACHE:FILE at runtime!

My guess is that this is a hack to try to make ly_temp point to the
'right' file more often.  That's frivolous.

> --- 4898,4921 ----
>               run_external(links[curdoc.link].lname);
>               refresh_screen = TRUE;
>           }
> !             break;
> !         case LYK_SCRIPT: /* use external scripts on current web page */
> ! #ifdef SOURCE_CACHE

Again, the ifdef is not enough.

> !             c = HTConfirmDefault(CHANGE_CACHE, YES);

Prompting should be the external programs responsibility.
That way whoever sets it up can decide whether he wants prompts
or not.

> !                 if (c == YES) {
> !                    sprintf(location,"file://localhost%s",ly_temp->name);
> !                StrAllocCopy(newdoc.address, location); }
> !                 else {
> !                StrAllocCopy(newdoc.address, curdoc.address); }

So after the run_script is done, the URL for the temp file will become
the new document URL.  Seems to me all relative URLs wil be broken!

> !             run_script(curdoc.address, ly_temp->name);
> ! #else
> !             run_script(curdoc.address, NULL);
> ! #endif /* SOURCE_CACHE */
> !             FREE(curdoc.address);

And here you prevent the 'real' document URL form even being pushed
on the history stack.

> !             newdoc.line = curdoc.line;
> !             newdoc.link = curdoc.link;
> !             refresh_screen = TRUE;
> !             break;
>   #endif /* USE_EXTERNALS */
> 
>       case LYK_ADD_BOOKMARK:  /* add link to bookmark file */
> ***************

> --- 3381,3387 ----
>   /*
>    * Construct a temporary-filename.  Assumes result is LY_MAXPATH chars long.
>    */
> ! PUBLIC int fmt_tempname ARGS3(

You don't need this as PUBLIC, you don't use it anywhere.

> ***************
> *** 3512,3517 ****
> --- 3514,3520 ----
>   #ifdef USE_EXTERNALS
>          "externals" ,
>   #endif
> +        "scripts" ,
>          (char *) 0     };

Adding this here, without adding a corresponding variable to the
array below, is pointless at best.

> ***************
> *** 5799,5812 ****
>   /*
>    * Maintain a list of all of the temp-files we create so that we can remove
>    * them during the cleanup.
>    */
> - typedef struct _LYTemp {
> -     struct _LYTemp *next;
> -     char *name;
> -     FILE *file;
> - } LY_TEMP;
> 
> ! static LY_TEMP *ly_temp;

As already mentioned, this should have stayed here.  If any other
code needs to know about this, it is most likely wrong.

  ---

I can imagine several ways to achieve what you want, without
breaking normal lynx code, and with more reliability.  Some are
already available.

A.  A script run as a DOWNLOADER, or PRINTER, or EXTERNAL
    (possibly invoking a child lynx.)
C.  A lynxcgi: script.
D.  Modify the cache file by hand.

It seems you are not happy with them, because they don't provide
one or more of the following benefits:

 1. Act on current document, not on link.
    (It seems you patch reduces to this benefit if no source
    caching is available.)

 2. Less keys to press.

 3. Continue browsing in the same process.

 4. No reloading. (if it weren't for those HTuncache's you had to add!)

But it should be possibly to extend what is already available
to provide more of those, with less disruptive changes.

1. and 2. are no principal problem.  It just takes some coding,
so that one new key would do what now requires several keys.

With lynxcgi you can load an arbitrarily modified copy into the
same process.  Please see my gettidy.sh (Jan archives) as an example.
(And if that's still too may keystrokes or too complicated, something
could be done about that.)  Some of the trickery with xxx_proxy could
now be avoided with RULES.

A Unix-pipe-like filtering module also would fit nicely into the
HTStream framework, if someone writes such a thing.

That leaves in principle ony 4., the need to reload from the
network.  Well, maybe you should have to, to keep the original
version clearly distinct form the modified copy.  But if there
is a cached copy, we could somehow make that available as
a source for (stream-like or otherwise) modification, without
destroying it.

I think what you actually want (without knowing it :)) is implement
'e'dit for non-local files. 'e' should do something on remote files
instead of just producing the message
   "Lynx cannot currently (e)dit remote WWW files."
(note what is implied by 'currently'...) 'e' could invoke a
configurable command which acts on the current document, just like
it does now for local files (the 'editor' doesn't have to be an
editor!).  For remote URLs the command would not be just a regular
text editor of course, since it has to take a URL.  The command
to invoke could be configurable based on URL-prefix match, very
similar as for EXTERN, but should not use the same EXTERNAL option.
A cache filename, if Lynx has one, can be passed in an environment
variable.  It is then up to the script whether it wants to mess
with the cache file instead of doing something with the URL alone.
When control returns to lynx, lynx could reload the document from
the cache (if there was one), like it does after various other
commands.  Better, it could prompt whether to completely reload
from the server, since that would make sence also if the REMOTE_EDITOR
(to invent a new option name here) was a real remote editing tool
that uploaded a new version to to the server.

I realize the this starts liek a description of your 'S' under
a new name 'e'.  But I think there are several significant differences.
The most important may be that Lynx would never start using a
cache file as a document.  It continues treating it as only a backup
cache copy (without caring whether it has changed), if it doesn't
have a cache file it doesn't do anything fundamentally different.

*Another* practical possibility, which goes kind of in the other
direction, woud be to introduce a key which explicitly makes a
a cache file into a document (and makes it the current document).
That would be somewhat like an instant download.  It would disassociate
the cache file from its function of being a 'cache file' for the
original document.  The user can *then* 'e'dit the file (where
the editor may be a script, or may be used to invoke a script).
This would make it very clear that the two documents are not
the same.

   ----

I think the code has enough practical as more principal problems
that it should not be added to the dev. code.  Definitely
not in the current form unless it is heavily ifdefd.

    Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]