savannah-hackers-public
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Savannah-hackers-public] Re: GNU Planet and Savannah


From: Nacho Gonzalez Lopez
Subject: [Savannah-hackers-public] Re: GNU Planet and Savannah
Date: Mon, 12 Jan 2009 09:07:39 +0100
User-agent: Mozilla-Thunderbird 2.0.0.17 (X11/20081018)

Sylvain Beucler escribió:
> On Sun, Jan 11, 2009 at 05:30:58PM +0100, address@hidden wrote:
>> Hi Sylvain.
>>
>>    Quick stat:
>>    sv_sv:~# grep 'GNU Planet' /var/log/apache2/access.log| wc -l
>>    216844
>>    (out of ~3,000,000 hits, so around 6%)
>>
>>    That's for a single week!  This places 'GNU Planet' as the 3rd best
>>    crawler, between msnbot and Slurp ;)
>>
>>    Apparently this matches:
>>    360 GNU projects * 4x per hour * 24h * 7d
>>    241920
>>
>>    Do you have an idea on how to make this more efficient?
>>
>> By reducing the period of the fetching, maybe:
>>
>> (* 360 24 7) 60480
>>
>> Nacho, what do you think?
> 
> Maybe I could provide a Sitemap (http://www.sitemaps.org/protocol.php)
> with 'last modified' fields, and you'd only grab newer/changed items?
> 
> I need to check if edited news items do get a newer 'last modified'
> date though - I saw that you overwrite edit news items, which is good.
> 

I'll check the capabilities of Planet[1], but I'm not sure if it will
work. Maybe it's better to have a single GNU feed with the information
of all projects. Anyway. I'm going to take a deep look into planet
source and I'll think in a better way to sync Savannah and GNUplanet.

BTW, I just reduced the polling frequency to one per our:
  360 GNU projects * 24h * 7d = 60480.

[1] http://www.planetplanet.org

Best regards,
Nacho.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]