lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev making lynx traversal crawl download html, not text


From: Bob
Subject: Re: lynx-dev making lynx traversal crawl download html, not text
Date: Fri, 22 Mar 2002 22:48:41 -0500

I don't find anywhere -traversal or -crawl use srcmode_for_next_retrieval,
so that we could get html instead of text by srcmode_for_next_retrieval(1)
instead of (0) or (-1). I'm looking elsewhere now.

OR

Since all I need to do is have lynx try to open a URL, satisfy cookies
demands, then request the same URL a second time to go around
yahoo's ad page with "Continue to message" link(just requesting
the same URL a second time), could I stdin a GET the URL twice,
or once on command line and GET again?

OR

If view mode were set to default to "source" rather than "presentation"
text mode, -traversal -crawl might download html.

OR

If -source was changed in the following way, -traversal -crawl -source
might not quit on the first link like -dump, and might keep on going in
source mode download to the *.dat files.

the way it is now -source will make lynx quit on the first download

/* -source */
PRIVATE int source_fun ARGS1(
 char *,   next_arg GCC_UNUSED)
{
    dump_output_immediately = TRUE;
    HTOutputFormat = (LYPrependBase ?
        HTAtom_for("www/download") : HTAtom_for("www/dump"));
    LYcols = MAX_COLS;
    return 0;
}


could be

/* -source */
PRIVATE int source_fun ARGS1(
 char *,   next_arg GCC_UNUSED) {
    dump_output_immediately = FALSE;
    if ( traversal != TRUE && crawl != TRUE ) {
      dump_output_immediately = TRUE;
    };
    HTOutputFormat = (LYPrependBase ?
        HTAtom_for("www/download") : HTAtom_for("www/dump"));
    LYcols = MAX_COLS;
    return 0;
}

That's not enough, though, since -traversal and -crawl would
be downloading files, not just sending to stdout as -source.

-traveral and -crawl build a links table

        links.[curdoc.link].lname
           add_to_table(curdoc.address)

which they download into *.dat files via

        sprintf(cfile,"lnk%08.dat",ccount);

Are the curdocs referenced in that table in source format?
Not since they are sprintfable?

-Bob

Thomas Dickey wrote:

> On Fri, Mar 22, 2002 at 07:30:19PM -0500, Bob wrote:
> > Either -dump or -source restrict the download to one file
> > only, correct?
> >
> > I was hoping to iterate the crawl with downloading in
> > html format.
> >
> > Perhaps there is a mode=1 set somewhere, instead of
> > mode=0, if srcmode_for_next_retrieval() is called from
> > somewhere? Or?
>
> I only see srcmode_for_next_retrieval() called with constant parameters:

So, in one of those places where the call is made with parameter
(0) or (-1) it might be nice if that was in a process under -traversal
or -crawl. Then I would put (1) there instead. I'll start looking at--

src/LYMainLoop.c:3819:  srcmode_for_next_retrieval(0);
src/LYMainLoop.c:4380:  srcmode_for_next_retrieval(-1);
src/LYMainLoop.c:4407:  srcmode_for_next_retrieval(0);
src/LYMainLoop.c:4472:  srcmode_for_next_retrieval(0);
src/LYOptions.c:3039:      srcmode_for_next_retrieval(0);
src/LYOptions.c:3049:      srcmode_for_next_retrieval(0);

-Bob

> src/LYGetFile.c:1118:PUBLIC void srcmode_for_next_retrieval ARGS1(
> src/LYGetFile.h:11:extern void srcmode_for_next_retrieval PARAMS((int));
> src/LYMainLoop.c:3802:              srcmode_for_next_retrieval(1);
> src/LYMainLoop.c:3819:                  srcmode_for_next_retrieval(0);
> src/LYMainLoop.c:4236:  srcmode_for_next_retrieval(1);
> src/LYMainLoop.c:4380:  srcmode_for_next_retrieval(-1);
> src/LYMainLoop.c:4385:  srcmode_for_next_retrieval(1);
> src/LYMainLoop.c:4407:  srcmode_for_next_retrieval(0);
> src/LYMainLoop.c:4447:          srcmode_for_next_retrieval(1);
> src/LYMainLoop.c:4469:      srcmode_for_next_retrieval(1);
> src/LYMainLoop.c:4472:      srcmode_for_next_retrieval(0);
> src/LYOptions.c:3032:       srcmode_for_next_retrieval(1);
> src/LYOptions.c:3039:               srcmode_for_next_retrieval(0);
> src/LYOptions.c:3049:           srcmode_for_next_retrieval(0);
>
> --
> Thomas E. Dickey <address@hidden>
> http://invisible-island.net
> ftp://invisible-island.net



; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]