lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] restricted access [Error 403]


From: Stefan Caunter
Subject: Re: [Lynx-dev] restricted access [Error 403]
Date: Tue, 28 Jul 2009 13:19:53 -0400

On Tue, Jul 28, 2009 at 8:04 AM, Henry Nelson <address@hidden> wrote:
>
> Hello Lynx friends,
>
> Many apologies for slowly slacking off the list.
>
> Today I was trying to access
>  "http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1062147";
> with Lynx, but I got the message:
>
> |              NCBI/ipmc3 - The requested page has restricted access [Error 
> 403]
> |
> |                              Restricted Access
> |
> |  You are trying to access a restricted page. If you believe that you
> |  have permission to view the page, please send an [1]email to PMC and
> |  include the following information.
> |  URL: http://pmc.lb.ncbi.nlm.nih.gov/articlerender.fcgi?artid=1062147
> |  Client: [my ip address; deleted]
> |  User Agent: Lynx/2.8.6pre.3 libwww-FM/2.14 SSL-MM/1.4.1 OpenSSL/0.9.7d
> |  Server: ipmc3
> |  Time: Tue Jul 28 07:42:35 2009 EDT
>

Much conversation about this recently. Many sites have a default
implementation of an apache module that blocks useragent strings that
include 'lynx'.
With lynx -useragent="googlebot"
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1062147 page
is served.
We debated a default empty useragent setting, to combat this issue,
but decided to put it back in for 2.8.7, so the answer is to spoof an
unblocked UA like googlebot or Mozilla in such a circumstance.

> However, this is the online open access version.  I can access it as
> expected with MSIE, Firefox and Safari.
>
> I'd be willing to report the problem to PMC if the cause is likely
> to be with their server, and not with Lynx.  Please justify your
> comments so I can make an intelligent appeal to PMC.

We could ask for them to not implement this apache module, but
generally this isn't going to be met with either understanding or
sympathy; it's a stock setting on some distributions - if not, it's
implemented to block scrapers. Not many sites block "googlebot" in my
experience, if that helps. Toggle the useragent setting in [O]ptions.
Can't remember offhand if it's in lynx.cfg, although I think we
discussed that (again) recently.

Stef




reply via email to

[Prev in Thread] Current Thread [Next in Thread]