lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev user agent when running lynx via crontab entry?


From: David Woolley
Subject: Re: lynx-dev user agent when running lynx via crontab entry?
Date: Thu, 8 Jul 1999 08:29:54 +0100 (BST)

> /usr/local/bin/lynx -useragent='Mozilla/4.04 [en]' -nolist -dump 
> http://www.imdb.com/StudioBrief | mail address@hidden
> 
> So, I was getting a "we're not letting you connect" message for the past
> week or so.. and after actually reading the message, it is being sent back
> because the useragent is empty.
> 

Based on public statements by the IMDB operator in the UK, and some 
followup email, this rejection is almost certainly to prevent bulk
fetching of pages by web crawlers.  Blocking lynx from automatically
fetching single pages would probably considered a desirable side effect,
as that also causes pages to be fetched without the adverts being read.

The original reason for IMDB being fussy about who they talk to is
probably Windows users who bulk fetch pages and are then presumed to not
look at most of them, imposing a load on the IMDB site  without their
being able to justify advertising revenue based on that load.

The actual statement was a strong objection to web crawlers, and
particularly those that didn't identify themselves properly in the user
agent.  They pointed out that IE4 did so when crawling ("subscribe to
site") and was blocked in that mode.  If you really need to disguise the
identity of Lynx, it is probable that either Lynx has been blacklisted
because of using it to crawl their site, or that they now operate a
white list of non-crawlers.

In this sort of context, I think there might be a legal case that forging
the useragent, when doing automated fetches from this site, constitutes
criminal fraud, but IANAL, and it has been argued that you can't deceive
a computer in the UK (although there is other legislation), where the
nearest to fraud is something like "obtaining pecuniary advantage by
deception".  I have heard the term theft of service used in a US context.

I think unauthorised access to the site would be breach of copyright and,
in the UK, come under the Computer Misuse Act.

If Lynx is being blacklisted for crawling, I can see little that can
be effectively done to remove the blacklisting, as, even if you made
it so that a special user agent string was sent when crawling or in
batch mode, and that this couldn't be overridden by the user, people
would bring out versions that had the source modified to override this
(and the licence wouldn't allow this to be stopped). 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]