bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] segfault encountered after HUGE recursive scrape


From: Gabriel L. Somlo
Subject: Re: [Bug-wget] segfault encountered after HUGE recursive scrape
Date: Mon, 9 Mar 2015 10:41:40 -0400
User-agent: Mutt/1.5.23 (2014-03-12)

On Mon, Mar 09, 2015 at 03:08:32PM +0100, Tim Ruehsen wrote:
> Hi Gabriel,
> 
> > wget: convert.c:928: register_redirection: Assertion `file != ((void *)0)'
> 
> The current line number of the assertion is somewhere else.

Line 928 in convert.c, function register_redirection() is

'assert (file != NULL);'

and appears to be the assertion I'm hitting in the output I included.

> But since 07a350d30c062a813a9ac2a6b3cd8b2ae07f0b26 convert.c hasn't been 
> touched... Please check your wget version with wget --version.
> (Did you use wget with the path to your self compiled executable ?)

it's "GNU Wget 1.16.1.38-dirty built on linux-gnu"

Yes, I did use the path to my self-compiled executable. I had built it
with

make CFLAGS="-g -Wall"

for debugging.

The reason it's "dirty" is that I had commented out another assertion:

@@ -1170,7 +1170,7 @@ create_image (struct bar_progress *bp, double
dl_total_time, bool done)
    * assertion fails. Instead Wget should continue downloading and
    * display a
    * horrible and irritating progress bar that spams the screen with
    * newlines.
    */
-  assert (count_cols (bp->buffer) <= bp->width + 1);
+  //assert (count_cols (bp->buffer) <= bp->width + 1);
 }
 
 /* Print the contents of the buffer as a one-line ASCII "image" so

in order to stop getting killed when URLs with crazy character sets
were being downloaded :)

Sadly, that probably makes it impossible to tell exactly where in the
official commit history I was when I built that binary :(

> I remember we fixed a redirection assertion bug a while before 
> 07a350d30c062a813a9ac2a6b3cd8b2ae07f0b26.

There's your commit db621341a4991456a8684fbdcf409f74f6259ec8 
from Nov. 17 2014, but my binary was definitely built after that.

If you can't think of a plausible way to (wrongly) end up with a
"file == NULL" in register_redirection(), I guess I'll just have
to restart the whole thing with the latest (carefully built, this time)
git master and see what happens in another month or so :)

Thanks much,
--Gabriel

> On Monday 09 March 2015 09:08:29 Gabriel L. Somlo wrote:
> > Hi,
> > 
> > I was trying to recursively pull down a list of cca. 160 web sites at
> > recursion depth 2, for web-in-a-box project in an isolated training
> > environment.
> > 
> > The command line was:
> > 
> > wget -rpEHNk -e robots=off --random-wait -t 2 -U mozilla -l 2 <site-list>
> > 
> > I was using git commit 07a350d30c062a813a9ac2a6b3cd8b2ae07f0b26 (a few more
> > commits were made since, but this thing ran for about three weeks before
> > segfaulting with an assert).
> > 
> > The last few lines to stdout/stderr were:
> > 
> > ...
> > --2015-03-05 19:51:42-- 
> > http://www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webfont.eot
> > Connecting to www.mozilla.org|63.245.217.105|:80... connected.
> > HTTP request sent, awaiting response... 301 Moved Permanently
> > Location:
> > https://www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webfont.eot
> > [following] --2015-03-05 19:51:42-- 
> > https://www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webfont.eot
> > Connecting to www.mozilla.org|63.245.217.105|:443... connected.
> > HTTP request sent, awaiting response... 200 OK
> > Length: 123774 (121K) [application/vnd.ms-fontobject]
> > Server file no newer than local file
> > ‘./var_www_topgen/www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webf
> > ont.eot’ -- not retrieving.
> > 
> > wget: convert.c:928: register_redirection: Assertion `file != ((void *)0)'
> > failed. Aborted (core dumped)
> > 
> > 
> > The back trace looks like this:
> > 
> > (gdb) bt
> > #0  0x00007fe7506cb8c7 in raise () from /lib64/libc.so.6
> > #1  0x00007fe7506cd52a in abort () from /lib64/libc.so.6
> > #2  0x00007fe7506c446d in __assert_fail_base () from /lib64/libc.so.6
> > #3  0x00007fe7506c4522 in __assert_fail () from /lib64/libc.so.6
> > #4  0x00000000004078f5 in register_redirection (
> >     from=0xa0968ea80
> > "http://www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webfont.eot";,
> > to=0xa0b00f6f0
> > "https://www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webfont.eot";)
> > at convert.c:928 #5  0x00000000004311ab in retrieve_url
> > (orig_parsed=0x99bc1c8e0,
> >     origurl=0xa0968ea80
> > "http://www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webfont.eot";,
> > file=0x7fff5e0f89e8, newloc=0x7fff5e0f89d0, refurl=0x9b14ed020
> > "http://www.mozilla.org/tabzilla/media/css/tabzilla.css";,
> > dt=0x7fff5e0f89dc, recursive=false, iri=0x67e400 <dummy_iri>,
> > register_status=true) at retr.c:949
> > #6  0x000000000042da3e in retrieve_tree (start_url_parsed=0x239ae30, pi=0x0)
> > at recur.c:301
> > #7  0x0000000000429f71 in main (argc=182, argv=0x7fff5e0f9298) at
> > main.c:1691
> > 
> > 
> > Under normal circumstances, I'd be debugging and learning about the source
> > code layout at the same time, and trying to figure out what the problem
> > might be on my own.
> > 
> > However, given that it took over 3 weeks of run time before I hit the
> > problem (meanwhile pulling down cca. 500Gb of material, and resulting in
> > a 42Gb core file, I'd like to start by asking someone more familiar with
> > the source tree for their best guess as to what this might be.
> > 
> > The machine I was using has 72Gb RAM, runs Fedora21, and this was the
> > only job running. I'm wondering if low memory could have had something
> > to do with it, although there's nothing in the logs to indicate that
> > might have happened.
> > 
> > Thanks much for any suggestions,
> > --Gabriel
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]