bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget produces erroneous robots.txt


From: leoh Jones
Subject: Re: [Bug-wget] wget produces erroneous robots.txt
Date: Wed, 18 Feb 2015 10:16:57 -0500

I can buy mis-configured server. I only tried it one the one server, maybe
that server has an issue with http 200 requests.

On Wed, Feb 18, 2015 at 8:49 AM, Darshit Shah <address@hidden> wrote:

> Hi Leoh,
>
> What you're seeing is entirely possible. Some misconfigured servers
> tend to send a HTTP 200 response with a 404 Not Found page.
>
> If the website that you were trying to mirror had a similar
> configuration, then its possible that when Wget tried to load
> robots.txt, the server responded with a 200 status code causing Wget
> to download the page you saw.
>
> On Wed, Feb 18, 2015 at 7:10 PM, leoh Jones <address@hidden> wrote:
> > Thanks for the reply.
> > I am using debian8 (jessie) if that matters. Though I did have the same
> > issue on a new version of ubuntu.
> > I did not use the option --content-on-error  I just used "-m"
> > I have no ~./wgetrc and no /etc/wget
> > Hey, where is the official github repo?
>
> Wget's development happens on the Savannah servers, not GitHub. You
> can find the sources here:
> http://git.savannah.gnu.org/cgit/wget.git
>
> > I will try again on the mailing list. Here is the wget version on my
> debian
> > machine
> >
> > $ wget --version
> > GNU Wget 1.16 built on linux-gnu.
> >
> > +digest +https +ipv6 +iri +large-file +nls +ntlm +opie +psl +ssl/gnutls
> >
> > Wgetrc:
> >     /etc/wgetrc (system)
> > Locale:
> >     /usr/share/locale
> > Compile:
> >     gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
> >     -DLOCALEDIR="/usr/share/locale" -I. -I../lib -I../lib
> >     -D_FORTIFY_SOURCE=2 -I/usr/include -g -O2 -fstack-protector-strong
> >     -Wformat -Werror=format-security -DNO_SSLv2 -D_FILE_OFFSET_BITS=64
> >     -g -Wall
> > Link:
> >     gcc -g -O2 -fstack-protector-strong -Wformat
> >     -Werror=format-security -DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall
> >     -Wl,-z,relro -L/usr/lib -lnettle -lgnutls -lz -lpsl -lidn -luuid
> >     ftp-opie.o gnutls.o http-ntlm.o ../lib/libgnu.a
> >
> > Copyright (C) 2014 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later
> > <http://www.gnu.org/licenses/gpl.html>.
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.
> >
> > Originally written by Hrvoje Niksic <address@hidden>.
> > Please send bug reports and questions to <address@hidden>.
> >
> >
> > On Wed, Feb 18, 2015 at 8:22 AM, Tim Ruehsen <address@hidden> wrote:
> >>
> >> On Wednesday 18 February 2015 07:45:53 leoh Jones wrote:
> >> > Pardon me, if this email reaches you in error.
> >> > email addresses taken from wget source.
> >> > I was mirroring a webserver with wget -m <address>
> >> > when it was done I went in to look at the files, and noticed that
> there
> >> > is
> >> > a robots.txt file. This was interesting, because the site mirrored
> >> > doesn't
> >> > have a robots.txt file.
> >> > so then, I looked at the robots.txt file contents, which was that of
> the
> >> > site 404 page.
> >>
> >> First of all, I can't reproduce it here with the latest version from
> git.
> >>
> >> Looks like the new feature --content-on-error is enabled. Did you use
> it ?
> >> What do /etc/wgetrc and ~./wgetrc look like ? And very important: what
> is
> >> the
> >> output of 'wget --version' ?
> >>
> >> > Is this a bug? I signed up for the mailing list, for wget bug reports
> >> > but
> >> > never heard back. Or is this expected behavior?
> >>
> >> When you sign up for the mailing list, you should get an email very soon
> >> with
> >> further instructions. Just try it again.
> >>
> >> Tim
> >
> >
>
>
>
> --
> Thanking You,
> Darshit Shah
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]