bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [bug #20398] Save a list of the links that were not follo


From: Tim Rühsen
Subject: Re: [Bug-wget] [bug #20398] Save a list of the links that were not followed
Date: Thu, 07 May 2015 20:59:59 +0200
User-agent: KMail/4.14.2 (Linux/3.16.0-4-amd64; KDE/4.14.2; x86_64; ; )

Hi Jookia,

if you want us to include your patch (and it is welcome of course), 
you have to sign a copyright assignment.

Please email the following information to address@hidden with a CC
to address@hidden, address@hidden and address@hidden, and we
will send you the assignment form for your past and future changes.


Please use your full legal name (in ASCII characters) as the subject
line of the message.
----------------------------------------------------------------------
REQUEST: SEND FORM FOR PAST AND FUTURE CHANGES

[What is the name of the program or package you're contributing to?]


[Did you copy any files or text written by someone else in these changes?
Even if that material is free software, we need to know about it.]


[Do you have an employer who might have a basis to claim to own
your changes?  Do you attend a school which might make such a claim?]


[For the copyright registration, what country are you a citizen of?]


[What year were you born?]


[Please write your email address here.]


[Please write your postal address here.]





[Which files have you changed so far, and which new files have you written
so far?]





Am Donnerstag, 7. Mai 2015, 15:58:53 schrieb Jookia:
> Follow-up Comment #5, bug #20398 (project wget):
> 
> I've found myself in need of this feature. I'm trying to download a website
> recursively without pulling in every single ad and its HTML. I'd like to be
> able to find out which URLs were rejected, why, and information about the
> domains (host, port, etc.)
> 
> I've patched my copy of Wget to dump all of this in to a CSV file which I
> can then tool through to get my desired results:
> 
> 
> 
> % grep "DOMAIN" rejected.csv | head -1
> DOMAIN,http://c0059637.cdn1.cloudfiles.rackspacecloud.com/flowplayer-3.2.6.m
> in.js,SCHEME_HTTP,c0059637.cdn1.cloudfiles.rackspacecloud.com,80,flowplayer-
> 3.2.6.min.js,(null),(null),(null),http://redated/,SCHEME_HTTP,redacted,80,,(
> null),(null),(null) % grep "DOMAIN" rejected.csv | cut -d"," -f4 | sort |
> uniq
> 0.gravatar.com
> 1.gravatar.com
> c0059637.cdn1.cloudfiles.rackspacecloud.com
> lh3.googleusercontent.com
> lh4.googleusercontent.com
> lh5.googleusercontent.com
> lh6.googleusercontent.com
> 
> 
> I've included a patch made in a few hours that does this.
> 
> (file #33955)
>     _______________________________________________________
> 
> Additional Item Attachment:
> 
> File name: 0001-rejected-log-Add-option-to-dump-URL-rejections-to-a-.patch
> Size:14 KB
> 
> 
>     _______________________________________________________
> 
> Reply to this item at:
> 
>   <http://savannah.gnu.org/bugs/?20398>
> 
> _______________________________________________
>   Message sent via/by Savannah
>   http://savannah.gnu.org/

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]