lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

-source change !work Re: lynx-dev making lynx traversal crawl download


From: Bob
Subject: -source change !work Re: lynx-dev making lynx traversal crawl download html, not text
Date: Sat, 23 Mar 2002 02:57:54 -0500

Just the source_fun change dump_output_immediately = FALSE
didn't permit -traversal -crawl -source to download html source.
-source still caused one file download then quit, as usual.

next look at

-traveral and -crawl build a links table

        links.[curdoc.link].lname
           add_to_table(curdoc.address)

which they download into *.dat files via

        sprintf(cfile,"lnk%08.dat",ccount);

Are the curdocs referenced in that table in source format?
Not since they are sprintfable?



Bob wrote:

> I don't find anywhere -traversal or -crawl use srcmode_for_next_retrieval,
> so that we could get html instead of text by srcmode_for_next_retrieval(1)
> instead of (0) or (-1). I'm looking elsewhere now.
>
> OR
>
> Since all I need to do is have lynx try to open a URL, satisfy cookies
> demands, then request the same URL a second time to go around
> yahoo's ad page with "Continue to message" link(just requesting
> the same URL a second time), could I stdin a GET the URL twice,
> or once on command line and GET again?
>
> OR
>
> If view mode were set to default to "source" rather than "presentation"
> text mode, -traversal -crawl might download html.
>
> OR
>
> If -source was changed in the following way, -traversal -crawl -source
> might not quit on the first link like -dump, and might keep on going in
> source mode download to the *.dat files.
>
> the way it is now -source will make lynx quit on the first download
>
> /* -source */
> PRIVATE int source_fun ARGS1(
>  char *,   next_arg GCC_UNUSED)
> {
>     dump_output_immediately = TRUE;
>     HTOutputFormat = (LYPrependBase ?
>         HTAtom_for("www/download") : HTAtom_for("www/dump"));
>     LYcols = MAX_COLS;
>     return 0;
> }
>
> could be
>
> /* -source */
> PRIVATE int source_fun ARGS1(
>  char *,   next_arg GCC_UNUSED) {
>     dump_output_immediately = FALSE;
>     if ( traversal != TRUE && crawl != TRUE ) {
>       dump_output_immediately = TRUE;
>     };
>     HTOutputFormat = (LYPrependBase ?
>         HTAtom_for("www/download") : HTAtom_for("www/dump"));
>     LYcols = MAX_COLS;
>     return 0;
> }
>
> That's not enough, though, since -traversal and -crawl would
> be downloading files, not just sending to stdout as -source.
>
> -traveral and -crawl build a links table
>
>         links.[curdoc.link].lname
>            add_to_table(curdoc.address)
>
> which they download into *.dat files via
>
>         sprintf(cfile,"lnk%08.dat",ccount);
>
> Are the curdocs referenced in that table in source format?
> Not since they are sprintfable?
>
> -Bob
>
> Thomas Dickey wrote:
>
> > On Fri, Mar 22, 2002 at 07:30:19PM -0500, Bob wrote:
> > > Either -dump or -source restrict the download to one file
> > > only, correct?
> > >
> > > I was hoping to iterate the crawl with downloading in
> > > html format.
> > >
> > > Perhaps there is a mode=1 set somewhere, instead of
> > > mode=0, if srcmode_for_next_retrieval() is called from
> > > somewhere? Or?
> >
> > I only see srcmode_for_next_retrieval() called with constant parameters:
>
> So, in one of those places where the call is made with parameter
> (0) or (-1) it might be nice if that was in a process under -traversal
> or -crawl. Then I would put (1) there instead. I'll start looking at--
>
> src/LYMainLoop.c:3819:  srcmode_for_next_retrieval(0);
> src/LYMainLoop.c:4380:  srcmode_for_next_retrieval(-1);
> src/LYMainLoop.c:4407:  srcmode_for_next_retrieval(0);
> src/LYMainLoop.c:4472:  srcmode_for_next_retrieval(0);
> src/LYOptions.c:3039:      srcmode_for_next_retrieval(0);
> src/LYOptions.c:3049:      srcmode_for_next_retrieval(0);
>
> -Bob
>
> > src/LYGetFile.c:1118:PUBLIC void srcmode_for_next_retrieval ARGS1(
> > src/LYGetFile.h:11:extern void srcmode_for_next_retrieval PARAMS((int));
> > src/LYMainLoop.c:3802:              srcmode_for_next_retrieval(1);
> > src/LYMainLoop.c:3819:                  srcmode_for_next_retrieval(0);
> > src/LYMainLoop.c:4236:  srcmode_for_next_retrieval(1);
> > src/LYMainLoop.c:4380:  srcmode_for_next_retrieval(-1);
> > src/LYMainLoop.c:4385:  srcmode_for_next_retrieval(1);
> > src/LYMainLoop.c:4407:  srcmode_for_next_retrieval(0);
> > src/LYMainLoop.c:4447:          srcmode_for_next_retrieval(1);
> > src/LYMainLoop.c:4469:      srcmode_for_next_retrieval(1);
> > src/LYMainLoop.c:4472:      srcmode_for_next_retrieval(0);
> > src/LYOptions.c:3032:       srcmode_for_next_retrieval(1);
> > src/LYOptions.c:3039:               srcmode_for_next_retrieval(0);
> > src/LYOptions.c:3049:           srcmode_for_next_retrieval(0);
> >
> > --
> > Thomas E. Dickey <address@hidden>
> > http://invisible-island.net
> > ftp://invisible-island.net
>
> ; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden


; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]