lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Now I got the error from idmb trying to...


From: Klaus Weide
Subject: Re: lynx-dev Now I got the error from idmb trying to...
Date: Thu, 22 Jul 1999 10:28:00 -0500 (CDT)

On Wed, 21 Jul 1999, Heather Stern wrote:
> address@hidden wrote:
> > On Wed, 21 Jul 1999, Heather Stern wrote:
> > >Clients like you, and the authors of certain rude crawler programs, abuse
> > >it by setting it to something they're not.  Site authors abuse it when they
> > 
> > I only set it so that I can actually access various sites, like my bank
> > account and stock account.  Otherwise I wouldn't be able to access them
> > from home AT ALL.  I am a big stickler for honesty, but I don't consider
> > this being dishonest, since all I am trying to do is get _some_ access.
> > Heck, I'm not even trying to get the sites to _make_ a "separate but equal"
> > version..  I'm simply changing the setting so that they think I'm one
> > of the "big two" browsers... and if something doesn't work on the site when
> > I access it via Lynx, I wait until I can check it with a "big two" browser
> > before complaining to the site (in case the problem has to do with my 
> > hackery).

I don't know how one can possibly put "stickler for honesty" and "I'm simply
changing the settings so that they think I'm one of..." in the same paragraph
without noticing the glaring contradiction...

I hope you don't really mean what you wrote.  It reads just as if you feel
dishonesty isn't dishonesty if it helps you trying to get what you want.

> Hey, I do it too - like I said, if we have to, it's because the site authors
> are abusing the feature - 

It may be justifiable sometimes.  That still doesn't make it a model of
honesty.  Pretending to be something you aren't just cannot be very "honest",
no matter whether it is legal, justified, etc.

> > >Your problem isn't the user agent string, very likely.  It's much more 
> > >likely
> > >that their logfile analysis (see    
> > >   Linkname: IMDb: Terms and Conditions of Use
> > >        URL: http://us.imdb.com/terms              )
> > >detected your cron job and has excommunicated you.
> > >
> > >I went and looked at their policies page.  "web accelerating" is against
> > >their policy.  So fast fetching via your commandline string is nearly
> > >certainly against their policy.  But doing it on a timed basis, definitely
> > >is, unless you follow robot policies, for which there are better tools to 
> > >use than lynx.
> > 
> > I'm doing it _once a day_.  Are you saying that they're analyzing the logs
> > over a time period and matching up all requests to previous ones on other
> > days?  Kind of hard to believe..

Well it seems they found out *somehow*.  Lots of things might have given them
a clue.  Does it matter much which exactly?

> When their page for terms is longer than any of their content pages, I tend
> to suspend my idealism and hopes for human nicety, and put my cynic hat on.

It would make more sense to combine the lenght of their "terms" page (and
all similar pages) to the combined lenght of all "content" pages.  If you 
count that way, I suspect IMDB will look more favorable than the lynx code
(with its GPL etc.).

Would you prefer that they leave folks in the dark about the rules that
apply?  Not that some of the stuff doesn't look quite bogus, at least the
part about "filtering"...

> > Especially since this has been a sporadic problem.

By design, it seems.

> Well, if that's the case, you're probably right, or at least, it's only
> happening on automatic scripts and not driven by an annoyed admin.
> 
> I'm not sure if it's violating the spirit of their terms to have the page
> prefetched at 2 am and then read it at 5 am, but it probably is.  Hey, I
> didn't say I agreed with it... if you'll recall correctly, I recommended
> that you try talking to them, and if they aren't helpful, they don't want
> your money (which is one level indirected, by way of being audience for
> the ads).
> 
> > >of lynx users everywhere!  But I suspect that you know perfectly well that
> > >you're not - and so slinking about, you want us to help you get around 
> > >what they want to do.  We have better fish to fry.
> > 
> > No, I'm not doing anything wrong!  I'm just trying to get the site so I
> > don't have to manually go to it.

I have never used one, but I bet that's exactly what 'web accelerators' are
about.

> As I read the terms, they basically *want* you to manually go to it, so they
> can shove ads in your face, or they want you to follow the robot rules of
> conduct.  Of course I don't hear any sign that you're filtering it, so I
> guess you're doing okay.

I do not frequent IMDB.  I don't know what much of their site "looks like".
But from what little I know about them, they don't deserve all this
negative attitude.  Is something wrong with requiring robot-like creatures
to follow robot rules?  Do they 'shove' ads in Lynx users' faces?  What I
see are links truthfully labeled as "Advertisement - click to support IMDb
sponsors".  They may not have ALTs on every IMG (although on many), but it
seems to me the site is reasonably accessible with lynx, they don't use
frames and SCRIPT for gimmicks that would exclude lynx or limit the
audience to the "Big 2".  To me it looks rather like a site that welcomes
lynx users, especially when you compare it to all the other crap out there.

I may be biased.  I've seen Rob Hartill from IMDb exchange messages with
lynx-dev, to solve a problem for a lynx-using client.  (Quite some time
ago, before it was "An [71]Amazon.com company", and probably before
amazon.com existed; but I assume there is still goodwill left.) I've also
seen IMDB participate in IETF HTTP mailing lists, to try to work with the
protocols and not against them as so many do.  (They were also the first
HTTP/1.1 "production" site I ever encountered.)  They are not the Bad
Guys; or if they are, they are at least among the better of them.

I think there is a very reasonable chance that problems can be sorted out
with these folks, if there are real problems for lynx and if someone tries.
But I find it understandable that they don't exactly welcome someone with
open arms who writes them about problems with faked headers from a cron
job.

> I take back off my cynic's cap, and heartily encourage you to take option
> 1 - talk to these help desk people, and find out why visiting their site
> is flaky for you.  (But first, let's narrow down whether user agent has 
> anything to do with it.)
> 
> Okay, if you don't have to set the user agent to get in by hand, I don't see
> that you'd need to set it for the cron job.

I suspect the bogus User-Agent header would rather tip them off that someone
is cheating than achieve anything else...

> Have you tried going manually at around the time your cron job normally runs?
> Maybe it's just a bad time of day for them.  (unlike NS, lynx doesn't spin
> its wheel trying 18 dozen times to get there, it tries but once and flaky
> connections have to be retried by hand.)  If so your cronjob will need some
> smidgen of AI, to try again a few times, maybe ping it and then try or
> something, before giving up on it just as you or I would.  Oh yeah, and 
> a sleep call would probably be a good idea, or maybe on failure, an at job
> for some short random time in the future... just like if I hit a site, and
> it bounced, I'd either blow it off, or maybe go somewhere else and come back,
> or get a soda before trying again.

Now you are giving him tips to turn his single cron job, which could
still be argued to be rather innocent, into a real robot...  Do you
want to make SURE he gets REALLY blocked? :)

   Klaus



reply via email to

[Prev in Thread] Current Thread [Next in Thread]