[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lynx-dev URL guessing for .CA domain suggestion
From: |
Bela Lubkin |
Subject: |
Re: lynx-dev URL guessing for .CA domain suggestion |
Date: |
Thu, 8 Oct 1998 16:49:37 -0700 |
Leonid Pauzner wrote:
> It would be really great to disable URL guessing for hosts
> ends with "dot + two letters" since it most likely a country code
> like ".uk" or ".ru"
> This will be a very limited disadvantage for edu/com/org/net users
> because second level domains usually have a longer names,
> but really important for country domains:
> a typo in user-defined URL fall into the obviously stupid "URL guessing"
> proccess like msk.ru.org/msk.ru.edu etc. - they definitely not exists.
>
> Changes should be somethere in LYExpandHostForURL()
> Can anybody fix it?
Here's the proper way to do what you want -- and this is definitely for
post-2.8.1 work. Add a third table to specify domain name endings which
you do not want guessed. You might have:
URL_DOMAIN_PREFIXES:www.
URL_DOMAIN_SUFFIXES:.com,.edu,.net,.org
URL_DOMAIN_NOGUESS_SUFFIXES:.ru,.ca,.uk,.com,.edu,.net,.org
The third line is a list of suffixes which are to be considered terminal
-- no guesses should be appended to them. Note that I've included .com
and so on in my sample entry; this prevents guesses like "can't find
yabbayabbayabba.com, trying yabbayabbayabba.com.com". It would *seem*
sensible to automatically include URL_DOMAIN_SUFFIXES in
URL_DOMAIN_NOGUESS_SUFFIXES, but we retain more flexibility if we don't.
Then the user can choose to or not, by what he puts in the NOGUESS
string.
The code already does this for a URL_DOMAIN_NOGUESS_PREFIXES list,
except that the list is embedded in the code. The embedded list is
equivalent to:
URL_DOMAIN_NOGUESS_PREFIXES:www.,ftp.,gopher.,wais.,cso.,ns.,ph.,finger.,news.,nntp.
If someone implements what I'm suggesting, I would recommend also
making URL_DOMAIN_NOGUESS_PREFIXES configurable.
Finally, I see that there is no way to specify "empty guess" in the
list. That is, suppose I would like to have:
URL_DOMAIN_PREFIXES:,www.,ftp.
URL_DOMAIN_SUFFIXES:.com,.edu,
Then if I do `lynx zark`, I intend it to guess:
zark.com <-- empty prefix
www.zark.com
ftp.zark.com
zark.edu <-- empty prefix
www.zark.edu
ftp.zark.edu
zark <-- empty prefix, suffix
www.zark <-- empty suffix
ftp.zark <-- empty suffix
You cannot specify an empty prefix or suffix. This should be fixed.
Other stuff: if it guesses a prefix that corresponds to a known
protocol, shouldn't it guess the protocol as well? That is, suppose the
above sequence of guesses succeeded at ftp.zark.edu: shouldn't it then
have guessed ftp protocol, i.e. ftp://ftp.zark.edu, not
http://ftp.zark.edu? Furthermore, shouldn't that be user-configurable
somehow? For instance, some sites use "web.wherever.com", so maybe I
want Lynx to guess that, with HTTP protocol:
URL_DOMAIN_PREFIXES:http:www,ftp,http:web
"look for www.whatever.i.said, and if you find it, make it an http: URL;
then look for ftp.whatever and make it ftp:; finally look for
web.whatever and make it http:"
Again, all of this is post-2.8.1 stuff.
>Bela<