lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev URL guessing for .CA domain suggestion


From: Bela Lubkin
Subject: Re: lynx-dev URL guessing for .CA domain suggestion
Date: Thu, 8 Oct 1998 16:49:37 -0700

Leonid Pauzner wrote:

> It would be really great to disable URL guessing for hosts
> ends with "dot + two letters" since it most likely a country code
> like ".uk" or ".ru"
> This will be a very limited disadvantage for edu/com/org/net users
> because second level domains usually have a longer names,
> but really important for country domains:
> a typo in user-defined URL fall into the obviously stupid "URL guessing"
> proccess like msk.ru.org/msk.ru.edu etc. - they definitely not exists.
> 
> Changes should be somethere in LYExpandHostForURL()
> Can anybody fix it?

Here's the proper way to do what you want -- and this is definitely for
post-2.8.1 work.  Add a third table to specify domain name endings which
you do not want guessed.  You might have:

  URL_DOMAIN_PREFIXES:www.
  URL_DOMAIN_SUFFIXES:.com,.edu,.net,.org
  URL_DOMAIN_NOGUESS_SUFFIXES:.ru,.ca,.uk,.com,.edu,.net,.org

The third line is a list of suffixes which are to be considered terminal
-- no guesses should be appended to them.  Note that I've included .com
and so on in my sample entry; this prevents guesses like "can't find
yabbayabbayabba.com, trying yabbayabbayabba.com.com".  It would *seem*
sensible to automatically include URL_DOMAIN_SUFFIXES in
URL_DOMAIN_NOGUESS_SUFFIXES, but we retain more flexibility if we don't.
Then the user can choose to or not, by what he puts in the NOGUESS
string.

The code already does this for a URL_DOMAIN_NOGUESS_PREFIXES list,
except that the list is embedded in the code.  The embedded list is
equivalent to:

  
URL_DOMAIN_NOGUESS_PREFIXES:www.,ftp.,gopher.,wais.,cso.,ns.,ph.,finger.,news.,nntp.

If someone implements what I'm suggesting, I would recommend also
making URL_DOMAIN_NOGUESS_PREFIXES configurable.

Finally, I see that there is no way to specify "empty guess" in the
list.  That is, suppose I would like to have:

  URL_DOMAIN_PREFIXES:,www.,ftp.
  URL_DOMAIN_SUFFIXES:.com,.edu,

Then if I do `lynx zark`, I intend it to guess:

  zark.com      <-- empty prefix
  www.zark.com
  ftp.zark.com
  zark.edu      <-- empty prefix
  www.zark.edu
  ftp.zark.edu
  zark          <-- empty prefix, suffix
  www.zark      <-- empty suffix
  ftp.zark      <-- empty suffix

You cannot specify an empty prefix or suffix.  This should be fixed.

Other stuff: if it guesses a prefix that corresponds to a known
protocol, shouldn't it guess the protocol as well?  That is, suppose the
above sequence of guesses succeeded at ftp.zark.edu: shouldn't it then
have guessed ftp protocol, i.e. ftp://ftp.zark.edu, not
http://ftp.zark.edu?  Furthermore, shouldn't that be user-configurable
somehow?  For instance, some sites use "web.wherever.com", so maybe I
want Lynx to guess that, with HTTP protocol:

  URL_DOMAIN_PREFIXES:http:www,ftp,http:web

"look for www.whatever.i.said, and if you find it, make it an http: URL;
then look for ftp.whatever and make it ftp:; finally look for
web.whatever and make it http:"

Again, all of this is post-2.8.1 stuff.

>Bela<

reply via email to

[Prev in Thread] Current Thread [Next in Thread]