lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Now I got the error from idmb trying to...


From: Heather Stern
Subject: Re: lynx-dev Now I got the error from idmb trying to...
Date: Wed, 21 Jul 1999 15:45:12 -0700 (PDT)

address@hidden wrote:
> On Wed, 21 Jul 1999, Heather Stern wrote:
> >Clients like you, and the authors of certain rude crawler programs, abuse
> >it by setting it to something they're not.  Site authors abuse it when they
> 
> I only set it so that I can actually access various sites, like my bank
> account and stock account.  Otherwise I wouldn't be able to access them
> from home AT ALL.  I am a big stickler for honesty, but I don't consider
> this being dishonest, since all I am trying to do is get _some_ access.
> Heck, I'm not even trying to get the sites to _make_ a "separate but equal"
> version..  I'm simply changing the setting so that they think I'm one
> of the "big two" browsers... and if something doesn't work on the site when
> I access it via Lynx, I wait until I can check it with a "big two" browser
> before complaining to the site (in case the problem has to do with my 
> hackery).

Hey, I do it too - like I said, if we have to, it's because the site authors
are abusing the feature - 

> >Your problem isn't the user agent string, very likely.  It's much more likely
> >that their logfile analysis (see    
> >   Linkname: IMDb: Terms and Conditions of Use
> >        URL: http://us.imdb.com/terms                )
> >detected your cron job and has excommunicated you.
> >
> >I went and looked at their policies page.  "web accelerating" is against
> >their policy.  So fast fetching via your commandline string is nearly
> >certainly against their policy.  But doing it on a timed basis, definitely
> >is, unless you follow robot policies, for which there are better tools to 
> >use than lynx.
> 
> I'm doing it _once a day_.  Are you saying that they're analyzing the logs
> over a time period and matching up all requests to previous ones on other
> days?  Kind of hard to believe..

When their page for terms is longer than any of their content pages, I tend
to suspend my idealism and hopes for human nicety, and put my cynic hat on.

> Especially since this has been a sporadic problem.

Well, if that's the case, you're probably right, or at least, it's only
happening on automatic scripts and not driven by an annoyed admin.

I'm not sure if it's violating the spirit of their terms to have the page
prefetched at 2 am and then read it at 5 am, but it probably is.  Hey, I
didn't say I agreed with it... if you'll recall correctly, I recommended
that you try talking to them, and if they aren't helpful, they don't want
your money (which is one level indirected, by way of being audience for
the ads).

> >of lynx users everywhere!  But I suspect that you know perfectly well that
> >you're not - and so slinking about, you want us to help you get around 
> >what they want to do.  We have better fish to fry.
> 
> No, I'm not doing anything wrong!  I'm just trying to get the site so I
> don't have to manually go to it.

As I read the terms, they basically *want* you to manually go to it, so they
can shove ads in your face, or they want you to follow the robot rules of
conduct.  Of course I don't hear any sign that you're filtering it, so I
guess you're doing okay.

I take back off my cynic's cap, and heartily encourage you to take option
1 - talk to these help desk people, and find out why visiting their site
is flaky for you.  (But first, let's narrow down whether user agent has 
anything to do with it.)

>>Of course all this isn't US helping YOU, today, with THEIR page.  But I think 
>>that at this point it has to be a problem between you and them.  You know how 
>>to go in the front door, and you say it works.  (You'd better try again the
>>normal way though, because if they blacklisted you for cron-jobbing, it
>>probably doesn't matter anymore what you browse with.)  Stop trying to go 
> 
> Nope, it does matter.  I just tried again.  I can get there fine manually
> (I just did it), and I even just did my crontab command pasted at a UNIX
> prompt and successfully got the mail.
> 
> It's only been when running it as a cron job, that I have sporadically gotten
> the failed message.

Okay, if you don't have to set the user agent to get in by hand, I don't see
that you'd need to set it for the cron job.

Have you tried going manually at around the time your cron job normally runs?
Maybe it's just a bad time of day for them.  (unlike NS, lynx doesn't spin
its wheel trying 18 dozen times to get there, it tries but once and flaky
connections have to be retried by hand.)  If so your cronjob will need some
smidgen of AI, to try again a few times, maybe ping it and then try or
something, before giving up on it just as you or I would.  Oh yeah, and 
a sleep call would probably be a good idea, or maybe on failure, an at job
for some short random time in the future... just like if I hit a site, and
it bounced, I'd either blow it off, or maybe go somewhere else and come back,
or get a soda before trying again.

Beyond that, I dunno what to say.  Setting the user agent doesn't change our
code to act like Other Browsers, just to say a different text strand.
        (Interesting thought, if we had emulation modes invoked this way.)

In order to really get to the root of it, I don't think you've been asking
the right questions.  And I still think that the IMDb helpdesk staff is going
to have to get involved.  You need to sort things on several details:
        * At that time that you're always trying.
        * Network connection conditions - I see "it fails" not what kind
          of failure.
        * Is it really only in lynx?  If so then what difference between
          lynx and the successful browser is it?

I think so far, you haven't tried by hand at the same time you normally
cronvisit.  Trying with another browser around then, or other services
around then, might reveal that your own site's connectivity stinks at that
hour.  If so, then lynx dropping out isn't odd, it's just that IE tries
harder, and (to my personal disgust and annoyance) NS tries for a very long
time, computerly speaking.  By "tries harder" I mean, they try more than
once, and we don't.

But so far using lynx directly works so I don't think it's that they're
trying to keep the lot of us text-browsing types out, at least not explicitly
and wholesale.  In which case, we don't know what it is.  Really!  We're
all volunteers, so chasing heisenbugs is not always fun.  It seems at first
glance that it may not be a bug at all.  When we find we may be chasing 
non-bugs for reasons that maybe against the site policy anyway it isn't 
very encouraging.

Does traceroute from you to imdb at the hour you're trying have nice speedy
hops, or lumpy traffic?  Is there a particular hop that's flaky?   Maybe a
better time is all you need.

I don't think at this point that I can help you better than IMDb's helpdesk;
they might be able to ping or traceroute you back and see if they have trouble
reaching you back.  Or maybe you really trip a bug in how their pages present.
Or maybe our nervous feelings are right and they won't like you at all.
But only THEY can help you determine that stuff.

* Heather

reply via email to

[Prev in Thread] Current Thread [Next in Thread]