[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gluster-devel] Crawling and indexing hardware
From: |
Krishna Srinivas |
Subject: |
Re: [Gluster-devel] Crawling and indexing hardware |
Date: |
Fri, 9 May 2008 15:37:40 +0530 |
On Fri, May 9, 2008 at 2:33 PM, Marcus Herou <address@hidden> wrote:
> Oooops. Didn't think of that with AFR. However I think Lucene always create
> new files when documents are flushed to disk so on commit basis there will
> be low imapact. But the scenario you're talking about will most definitely
> kick in when optimization of the index occurs. Hundreds of smaller files
> aggregates into bigger more compact files. Since Lucene cannot hold all
> smaller files in memory it will flush parts of the merge in "log" files
> which will trigger the case you're talking about.
>
> So basically the absolute worst case possible using GlusterFS with AFR would
> be to use it with a webserver access log right ?
>
> I think I will go for AFR when it comes to the billion small files since
> they are almost never updated but is there a smart way of updating big files
> in GlusterFS ?
What do you mean by smart way? are you referring to the unsmart way of
selfheal happening now? or just write()s
>> Do you plan to do any AFR (automatic file replication) ? If so,
>> consider that even a one-byte change to your "big index files" will
>> cause the /entire/ file to be AFR'd between all participating nodes.
Marcus, what do you mean by this?
Krishna
>
> Perhaps Gluster is a bad choice for Lucene indexing and I really need to go
> for having many cheap boxes with local disks instead.
>
> Kindly
>
> //Marcus
>
>
>
> On Fri, May 9, 2008 at 10:37 AM, Daniel Maher
> <address@hidden<address@hidden>>
> wrote:
>
>> On Wed, 7 May 2008 20:06:40 +0200 "Marcus Herou"
>> <address@hidden> wrote:
>>
>> > 1. Big index files ~x Gig each
>> > 2. Many small files in a huge amount of directories.
>>
>> Do you plan to do any AFR (automatic file replication) ? If so,
>> consider that even a one-byte change to your "big index files" will
>> cause the /entire/ file to be AFR'd between all participating nodes.
>>
>> > Finally what tools would suite to test zillions of small files ?
>> > Bonnie++ ? Fewer big files ? Still Bonnie++ or perhaps IOZone ?
>>
>> IOZone is an interesting tool, assuming you can interpret the
>> results. :P I have been using Bonnie++ and FFSB extensively over the
>> past couple of weeks to stresstest / benchmark Gluster. Both have the
>> advantage of producing easily interpretable results, and FFSB is highly
>> configurable, depending on what sort of tests you'd like to run (read /
>> write / both, small / large files, lots / few files, etc..).
>>
>> The following page contains some sample FFSB configs to work from :
>> http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html<http://tastic.brillig.org/%7Ejwb/zfs-xfs-ext4.html>
>> (see "Step 8".)
>>
>> Cheers !
>>
>> --
>> Daniel Maher <dma AT witbe.net>
>>
>
>
>
> --
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> address@hidden
> http://www.tailsweep.com/
> http://blogg.tailsweep.com/
> _______________________________________________
> Gluster-devel mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
- [Gluster-devel] Crawling and indexing hardware, Marcus Herou, 2008/05/07
- Re: [Gluster-devel] Crawling and indexing hardware, Daniel Maher, 2008/05/09
- Re: [Gluster-devel] Crawling and indexing hardware, Marcus Herou, 2008/05/09
- Re: [Gluster-devel] Crawling and indexing hardware, Marcus Herou, 2008/05/09
- Re: [Gluster-devel] Crawling and indexing hardware, Anand Avati, 2008/05/09
- Re: [Gluster-devel] Crawling and indexing hardware, Krishna Srinivas, 2008/05/09
- Re: [Gluster-devel] Crawling and indexing hardware, Marcus Herou, 2008/05/09
- Re: [Gluster-devel] Crawling and indexing hardware, gordan, 2008/05/09
- Re: [Gluster-devel] Crawling and indexing hardware, Marcus Herou, 2008/05/09