lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] Re: Lynx fails on http://politiken.dk


From: David Woolley
Subject: Re: [Lynx-dev] Re: Lynx fails on http://politiken.dk
Date: Tue, 12 Oct 2004 21:34:28 +0100 (BST)

> That was exactly it. The possibility of the Lynx user-agent header being
> blocked and not those of elinks and w3m seemed so remote to me that I did

That's only because lynx is better known (or was better known at the
time the site's webcaps file was written).

Both the PHP and ASP have a facility to read a file called webcaps, which
lists the browser's capabilities.  The idea of the file is so that you
can generate horribly device dependent HTML that produces your desired
bells and whistles on all browsers (much better is to write device 
independent HTML).  Of course, what actually happens is that authors
don't bother supporting browsers that say no frames or no scripting, rather
than providing a plain fallback.

The problem for minority browsers is that their entries tend to lag 
behind reality and also to err towards a more restrictive description
than the browser user would want (e.g. lynx will be indicated as not
frames capable, even though it understands them and provides access).
Because descriptions for minority browsers lag, new minority browsers
may not get in at all, and may get a default entry that indicates
reasonable IE-likeness.  They are also likely to pretend to be IE well
enough to fool the pattern matches used.

It's unlikely that a site will actively reject lynx, but quite likely that
they will reject all browsers that their webcaps say don't do frames, etc.,
or that they use crude pattern matches to "positively" identify the big 2.

Other reasons for sort of blocking are trying to keyword stuff search 
engines.  I'm sure that there are some web sites that will send a keyword
list page to any browser that they don't recognize.

One specific raeson for rejecting Lynx is that it can be used to crawl a
site.  Any site that is paid for by advertising will normally have 
terms of service that forbid any automated access, and some will take
active measures to prevent it.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]