[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lynx-dev lynx and proxy. URI question
From: |
Klaus Weide |
Subject: |
Re: lynx-dev lynx and proxy. URI question |
Date: |
Wed, 14 Jul 1999 19:26:53 -0500 (CDT) |
On Wed, 14 Jul 1999, Heather Stern wrote:
> James Kilfiger wrote:
> > Hi
> > I am using lynx with my university's web cache.
> > This is located at http://wwwcache.warwick.ac.uk:3128/
> >
> > I incorrectly set the http_proxy enviroment variable to
> > http://wwwcache.warwick.ac.uk:3128
> > (without the trailing slash)
> >
> > When I tried to follow any link with this setting, it seemed that lynx
> > would strip the `http:/' from the front of the link and then complain
> > that (say) `/www...' was an ill formed URL. This seems like odd
> > behaviour. Can anyone explain why the slash is required, and why
> > dropping it causes this problem
>
> Um, strictly speaking,
> http://www.domain.com:80/foo
> is probably a script living in the document root of domain.com
> http://www.domain.com:80/foo/
> is a directory named foo immediately located in the document root
> http://www.domain.com:80/foo.html
[ More snipped ]
The discussion about slashes in URL paths may be useful, but...
1. For "real URLs", the slash after the host part (which may include a :port)
is never required if there is nothing more following it (no path).
The URL "http://www.domain.com:80" always identifies exactly the same
resource as "http://www.domain.com:80/"[*]. Adding or omitting the slash
doesn't change the meaning (so lynx adds one in that case if it is not
already there). This is different from slashes that come later.
[*] If we don't consider some obscure HTTP/1.1 method for which it MAY
make a difference, at the level of the HTTP request: OPTIONS. (not sure
whether this is still in the HTTP 1.1 draft standard)
2. The xyz_proxy isn't used as a "real URL", in the sense that no resource
(document etc.) is addressed directly by it.
For lynx, the slash at the end of http_proxy is needed because of the way
it is internally handled. Say you want to access
http://www.example.com/somepage.html. Lynx internally builds a "physical"
URL by just prepending the proxy:
http://wwwcache.warwick.ac.uk:3128/http://www.example.com/somepage.html
That isn't a real URL that validly identifies anything (normally), it is
just a convenient construct. It is never passed to the outside world,
before actually sending a HTTP request this combined string is parsed
into parts again.
If the http_proxy was given as "http://wwwcache.warwick.ac.uk:3128"
without trailing slash, the "physical" URL becomes
http://wwwcache.warwick.ac.uk:3128http://www.example.com/somepage.html
and Lynx can't properly pick it apart.
Comparing 'lynx -trace' output made with
a) the "right" proxy specification
b) the "wrong" proxy specification
c) without any proxy
should make it clearer.
Klaus