lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev lynx and proxy. URI question


From: Klaus Weide
Subject: Re: lynx-dev lynx and proxy. URI question
Date: Wed, 14 Jul 1999 19:26:53 -0500 (CDT)

On Wed, 14 Jul 1999, Heather Stern wrote:

> James Kilfiger wrote:
> > Hi
> > I am using lynx with my university's web cache. 
> > This is located at http://wwwcache.warwick.ac.uk:3128/
> > 
> > I incorrectly set the http_proxy enviroment variable to
> >  http://wwwcache.warwick.ac.uk:3128
> > (without the trailing slash)
> > 
> > When I tried to follow any link with this setting, it seemed that lynx
> > would strip the `http:/' from the front of the link and then complain
> > that (say) `/www...' was an ill formed URL.  This seems like odd
> > behaviour.  Can anyone explain why the slash is required, and why
> > dropping it causes this problem
> 
> Um, strictly speaking, 
> http://www.domain.com:80/foo
>       is probably a script living in the document root of domain.com
> http://www.domain.com:80/foo/
>       is a directory named foo immediately located in the document root
> http://www.domain.com:80/foo.html

[ More snipped ]

The discussion about slashes in URL paths may be useful, but...

1. For "real URLs", the slash after the host part (which may include a :port)
is never required if there is nothing more following it (no path).
The URL "http://www.domain.com:80"; always identifies exactly the same
resource as "http://www.domain.com:80/"[*]. Adding or omitting the slash
doesn't change the meaning (so lynx adds one in that case if it is not
already there).  This is different from slashes that come later.

[*] If we don't consider some obscure HTTP/1.1 method for which it MAY
make a difference, at the level of the HTTP request: OPTIONS. (not sure
whether this is still in the HTTP 1.1 draft standard)

2. The xyz_proxy isn't used as a "real URL", in the sense that no resource
(document etc.) is addressed directly by it.

For lynx, the slash at the end of http_proxy is needed because of the way
it is internally handled.  Say you want to access
http://www.example.com/somepage.html.  Lynx internally builds a "physical"
URL by just prepending the proxy:

  http://wwwcache.warwick.ac.uk:3128/http://www.example.com/somepage.html

That isn't a real URL that validly identifies anything (normally), it is
just a convenient construct.  It is never passed to the outside world,
before actually sending a HTTP request this combined string is parsed
into parts again.

If the http_proxy was given as "http://wwwcache.warwick.ac.uk:3128";
without trailing slash, the "physical" URL becomes

  http://wwwcache.warwick.ac.uk:3128http://www.example.com/somepage.html

and Lynx can't properly pick it apart.

Comparing 'lynx -trace' output made with
  a) the "right" proxy specification
  b) the "wrong" proxy specification
  c) without any proxy
should make it clearer.

   Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]