lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Relative URLs, BASE implementation


From: Klaus Weide
Subject: Re: LYNX-DEV Relative URLs, BASE implementation
Date: Tue, 22 Oct 1996 18:18:24 -0500 (CDT)

On Tue, 22 Oct 1996, Foteos Macrides wrote:

> Klaus Weide <address@hidden> wrote:
> >[...]
> >Hey, improving HTParse.c looks like a reasonable project for someone
> >with some knowledge of C and URL syntax (or ability to read an RFC), but
> >without the time to figure out all of the Lynx code.  Just HTParse.h
> >and HTParse.c should be enough.  Anybody up for it?
> 
>       I suspect your assessment of not needing to understand the Lynx
> code for mods of HTParse.c functions may be misguided (and your worsening
> of the situation when you tried, perhaps should have been a tip off,
> though a period of "trial and error" is important for anyone to get good
> at supporting Lynx 8-).

You may well be right that a more thorough understanding of other Lynx
modules would be needed to do a good job on HTParse.c, but my reported
worsening of the situation should not be construed as corroboration of
that assertion :-).  'Cause when I did what I did it was in the process
of getting a first running version that had some of the new libwww5 core
modules in it - the immediate goal then was to resolve conflicts between
different files (duplicate names, or conflicting parameter lists), and
once a specific file would compile after some modifications I would move
on to the next one.  So the goal was not to improve the parsing and URL
resolution in HTParse.c.  I deferred a closer look at what I had really
done until later, but so far haven't gotten around to it yet :-)
  Actually, looking now (but still not looking closely) again at what I 
did, I was calling the new lib's HTSimplify() from Lynx's lib's HTParse(),
a not-so-bright idea alltogether (without really checking whether the 
one was doing what the other expected).
 
>       The parsing functions there are used not only for http URLS, but
> also for the built-in gateways, a variety of internal URL schemes, and
> external but lynx-specific schemes (e.g., lynxexec, lynxprog, lynxcfg).
> It also reflects FM judgements about what will succeed in the majority
> of cases with what the real world throws at Lynx.  Beware of creating
> FAQs like the one about why some POST redirections "don't work anymore"
> with v2.6.

Well it wasn't too bad now, was it? :)

>              The FAQ about why HREF="../foo" resolutions don't always
> work in v2.6 even though Netscape gets it "right", reflects my incorrect
> judgement that it was about time for Lynx to do that right, and it wasn't
> yet (Does someone want to add a prompt for choosing the wrong way, when
> needed? 8-).  If you follow what RFC 1808 recommends for error handling
> (though it's not described as that), beware of many more FAQs that will
> be created for Lynx.
> 
>       Note that Lynx v2.6 does use functions in HTAAServ, HTUU.c and
> HTRules.c, and would be using more of them if the HTTP/1.1 draft
> and associated IDs had been further along at the time of its release.

Well it was just a rough guess.  I hope Henry is reading this and will,
as a consequence of your note, not spend much time in trying to omit
those files from the binary.

Of the three files mentioned, HTRules.c is the only one where I really
have an idea what it does.  The only place I could find it referenced
in the rest of the code was HTAccess.c (and only #ifndef NO_RULES),
and when I checked the call always returned success - no surprise since
no rules were registered.  I know basics of the use of the rules file
in CERN_httpd, and was thinking that having some of that in Lynx might
be "neat" (things like client-side redirection based on URL, Filtering,
and remote configuration of all of that).  But could you tell what use(s)
YOU had in mind?  I suspect that you would regard some of my ideas as
frivolous...
 
>       Lynx v2.6 does not handle ;parameters or ?searchparts in the
> HTParse.c functions (treats them, there, as part of the path field).
> It only handles fragments (following any ;parameters and/or ?searchparts,
> if present) in that code.  It handles ;type=[A, D, I] in the ftp gateway,
> and ?searchpart in GridText.c.  

OK, but is there any good reason why it should NOT treat them, for the
purpose of resolving relative URLs, as in the RFC?  (I am just asking
here.)  For the interface to the calling function, PARSE_PATH could still
mean "path" in the "historical" sense (rather than the RFC sense), i.e.
a call which requests the PATH would get the concatenation of
<path_RFC> and <params> and <query> returned - IF <params> and/or <query>
parts are to be returned.  This would be to avoid changing all the places
where HTParse() is called (for reasons of either laziness or compatibility).

>                                The treatment of an empty HREF as the
> base *less* any fragment it had is intentional, and I still can't think
> of a "real world" situation in which that wouldn't be more appropriate
> than retaining the fragment.  

I was suspecting that noncompliance for the following four cases
(pasted in here from http://www.tezcat.com/~kweide/lynxhacks/RURL.html>)

   [6]//g      = <URL:http://g>              FAIL           FAIL
   [25]<empty> = <URL:http://a/b/c/d;p?q#f>  FAIL           ??? (no link)
   [38]http:g  = <URL:http:g>                FAIL           FAIL
   [39]http:   = <URL:http:>                 FAIL           FAIL

was either for compatibility or irrelevant ("It's not a bug, it's a
feature").

As for the other cases where I have noted FAIL for Lynx, two involve
not treating ;params specially (which is probably of little practical
consequence *currently*, because they are not used much):

   [7]?y       = <URL:http://a/b/c/d;p?y>    FAIL           FAIL
   [14];x      = <URL:http://a/b/c/d;x>      FAIL           FAIL

Two involve messing with the ?search or #fragment part (removing ./
there, which is not A Good Thing):

   [9]g?y/./x  = <URL:http://a/b/c/g?y/./x>  FAIL           OK
   [12]g#s/./x = <URL:http://a/b/c/g#s/./x>  FAIL           OK

Three involve removing a final slash where the RFC wants it kept:

   [17].       = <URL:http://a/b/c/>         FAIL           OK
   [19]..      = <URL:http://a/b/>           FAIL           OK
   [35]./g/.   = <URL:http://a/b/c/g/>       FAIL           OK

These last ones may lead to problems (making the wrong HTTP request,
although the cost may just be a redirection, or doing the wrong thing
when further relative URLs are based on the result of this resolution,
when the URLs are actually file: URLs etc.) 

In case Lynx is going to get still more RFC compliant, I see a need 
for -historical_rurl arising...

Of the remaining two, one is definitely wrong:

   [27]../../../../g = <URL:http://a/../../g> FAIL          FAIL

(Lynx resolves it to <URL:http://a/..>, with no sign of /g left),
and for the last one I don't see any reason to come to a different result
than for [21]../g : 

   [34]./../g  = <URL:http://a/b/g>          FAIL           OK


Now you are moving to a problem that I wasn't aware of (because it is
not covered by the RFC 1808 examples):

>                               Note also that RFC 1808, without comment
> or explanation, changed what's dictated in RFC 1738 for escaping the
> hash ('#')  and parsing of fragments.  RFC 1808 directs parsing for the
> hash from left to right, and indicates that unescaped hashes could be
> present to the right of that punctuation.  RFC 1738 states that only
> one unescaped hash can be present, as punctuation for a fragment, and
> that all other hashes should be escaped, even in URLs that do not
> support fragments (e.g., mailto URLs).  If only one unescaped hash
> can be present, the direction of parsing is irrelevant, and right to
> left is more efficient, as is done in all libwwws, including the
> most current Reference Library, and by most deployed browsers (not
> MSIE, whose developers, being new, didn't know what grains of salt
> to apply to RFC 1808 8-).

So is your conclusion that things should stay as they are, hash-wise?
 
>       Much of what is in RFC 1808 was written from an armchair,
> rather than from hands on implementation experience (i.e., it's like
> what's in the RFCs and "official" drafts for FORM markup 8-).

Whether the specific authr of this particular RFC seems to prefer
an armchair or a throne, I leave for others to decide...

>       Note also that what RFC 1808 says about handling of ".."
> embedded within paths, though logical, doesn't necessarily take
> into account how that might be used for spoofing on Unix, where
> it has meaning for the platform's actual file system.  

Ok, but that deed is already done :-) Lynx already does send request-
URLs starting with /.. to HTTP servers, whether you now regret that
decision or not :->.

>                                                        We worry
> about the parsing of those in a variety of LYfoo functions, not
> just in the libwww modules.  We also treat '~' as a meaningful
> symbol in file and ftp URLs, under some circumstances, though
> that's not in the specs.  If you simply follow the specs for
> that, you'll create yet more FAQs. :) :)

Well there's a whole brand new domain now waiting to be filled with
FAQ documents.. :-)

I hope I haven't terminally bored *everyone* yet, with an analysis which
is very incomplete because it is only bases on the RFC's examples...

  Klaus

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]