lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev user agent when running lynx via crontab entry?


From: David Woolley
Subject: Re: lynx-dev user agent when running lynx via crontab entry?
Date: Sat, 10 Jul 1999 15:38:01 +0100 (BST)

> What makes you think a significant percentage of lynx accesses to the
> site is in a comparable mode?  Or, more important, why would IMDB
> folks think that?

Basically I'm basing it on the sort of questions that get asked, which
tend to imply that Lynx is often used to try to mirror pages off such
sites, if not to crawl them.

IMDB would probably only worry about Lynx it it had a signficant
market share; their real problem is shareware Windows page pre-fetching
programs (IE4 would be a problem if it weren't easy to block).

Sites like IMDB are paid for by advertising, so any tool which made 
a significant number of hits without presenting the banner adverts to the
users, would probably be discouraged.  Again, Lynx probably doesn't have
the market share needed to worry them.

> So far there has been a lot of speculation in this thread based on
> unstated or unverified assumptions.  (Starting with the very beginning,
> the assumption that unser-agent had anything to do with anything.)

I tend to agree that the user agent is a red herring - the most likely
problem with IMDB would be a forged user agent that wasn't properly
forged, resulting in their thinking it was a Windows pre-fetcher 
pretending to be Netscape.  However the forged user agent looks OK.

In my view, though, if IMDB were to need a forged user agent, it would
be to bypass their acceptable use policy and therefore Lynx users should
use an acceptable browser, or get that policy changed.  Forging the user
agent makes it more difficult to get such policies changed as the number
of Lynx users is misrepresented.

The facts are:

IMDB object to programs that crawl their site (letter to Demon Dispatches);

Their UK mirror, at least, blocks crawlers based on User-Agent
and would block IE4's subscription mode if it didn't respect the
robots.txt etc. on the site (private correspondence in response to the 
magazine letter).

Lynx is often used to crawl sites (the facility exists and questions are 
asked about how to use it etc.)

Lynx does not respect robots restrictions.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]