bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] segfault encountered after HUGE recursive scrape


From: Gabriel L. Somlo
Subject: [Bug-wget] segfault encountered after HUGE recursive scrape
Date: Mon, 9 Mar 2015 09:08:29 -0400
User-agent: Mutt/1.5.23 (2014-03-12)

Hi,

I was trying to recursively pull down a list of cca. 160 web sites at
recursion depth 2, for web-in-a-box project in an isolated training
environment.  

The command line was:

wget -rpEHNk -e robots=off --random-wait -t 2 -U mozilla -l 2 <site-list>

I was using git commit 07a350d30c062a813a9ac2a6b3cd8b2ae07f0b26 (a few more
commits were made since, but this thing ran for about three weeks before
segfaulting with an assert).

The last few lines to stdout/stderr were:

...
--2015-03-05 19:51:42--  
http://www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webfont.eot
Connecting to www.mozilla.org|63.245.217.105|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: 
https://www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webfont.eot 
[following]
--2015-03-05 19:51:42--  
https://www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webfont.eot
Connecting to www.mozilla.org|63.245.217.105|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 123774 (121K) [application/vnd.ms-fontobject]
Server file no newer than local file 
‘./var_www_topgen/www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webfont.eot’
 -- not retrieving.

wget: convert.c:928: register_redirection: Assertion `file != ((void *)0)' 
failed.
Aborted (core dumped)


The back trace looks like this:

(gdb) bt
#0  0x00007fe7506cb8c7 in raise () from /lib64/libc.so.6
#1  0x00007fe7506cd52a in abort () from /lib64/libc.so.6
#2  0x00007fe7506c446d in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007fe7506c4522 in __assert_fail () from /lib64/libc.so.6
#4  0x00000000004078f5 in register_redirection (
    from=0xa0968ea80 
"http://www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webfont.eot";,
    to=0xa0b00f6f0 
"https://www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webfont.eot";) at 
convert.c:928
#5  0x00000000004311ab in retrieve_url (orig_parsed=0x99bc1c8e0,
    origurl=0xa0968ea80 
"http://www.mozilla.org/media/fonts/OpenSans-ExtraBoldItalic-webfont.eot";, 
file=0x7fff5e0f89e8, newloc=0x7fff5e0f89d0,
    refurl=0x9b14ed020 
"http://www.mozilla.org/tabzilla/media/css/tabzilla.css";, dt=0x7fff5e0f89dc, 
recursive=false, iri=0x67e400 <dummy_iri>,
    register_status=true) at retr.c:949
#6  0x000000000042da3e in retrieve_tree (start_url_parsed=0x239ae30, pi=0x0)
    at recur.c:301
#7  0x0000000000429f71 in main (argc=182, argv=0x7fff5e0f9298) at main.c:1691


Under normal circumstances, I'd be debugging and learning about the source
code layout at the same time, and trying to figure out what the problem might
be on my own.

However, given that it took over 3 weeks of run time before I hit the
problem (meanwhile pulling down cca. 500Gb of material, and resulting in
a 42Gb core file, I'd like to start by asking someone more familiar with
the source tree for their best guess as to what this might be.

The machine I was using has 72Gb RAM, runs Fedora21, and this was the
only job running. I'm wondering if low memory could have had something
to do with it, although there's nothing in the logs to indicate that
might have happened.

Thanks much for any suggestions,
--Gabriel



reply via email to

[Prev in Thread] Current Thread [Next in Thread]