[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gluster-devel] Re: New IDEA: The Checksumming xlator ( AFR Translat
From: |
Gareth Bult |
Subject: |
Re: [Gluster-devel] Re: New IDEA: The Checksumming xlator ( AFR Translator have problem ) |
Date: |
Thu, 17 Jan 2008 10:11:39 +0000 (GMT) |
Erm, I said;
>to write the change to a logfile on the remaining volumes
By which I meant that the log file would be written on the remaining available
server volumes ... (!)
Regards,
Gareth.
----- Original Message -----
step 3.: "Angel" <address@hidden>
To: address@hidden
Sent: 17 January 2008 10:07:12 o'clock (GMT) Europe/London
Subject: Re: [Gluster-devel] Re: New IDEA: The Checksumming xlator ( AFR
Translator have problem )
The probems is:
If you place AFR on the client, how servers get the log file during recovery
operations??
Regards, Angel
El Jueves, 17 de Enero de 2008 10:44, Gareth Bult escribió:
> Hi,
>
> Yes, I would agree these changes would improve the current implementation.
>
> However, a "better" way would be for the client, on failing to write to ONE
> of the AFR volumes, to write the change to a logfile on the remaining volumes
> .. then for the recovering server to playback the logfile when it comes back
> up, or to recopy the file if there are insufficient logs or if the file has
> been erased.
>
> This would "seem" to be a very simple implementation ..
>
> Client;
>
> Write to AFR
> If Fail then
> if log file does not exist create log file
> Record, file, version, offset, size, data in logfile
>
> On server;
>
> When recovering;
>
> for each entry in logfile
> if file age > most recent transaction
> re-copy whole file
> else
> replay transaction
>
> if all volumes "UP", remove logfile
>
> ?????
>
> One of the REAL benefits of this is that the file is still available DURING a
> heal operation.
> At the moment a HEAL only takes place when a file is being opened, and while
> the copy is taking place the file blocks ...
>
> Gareth.
>
> ----- Original Message -----
> step 3.: "Angel" <address@hidden>
> To: "Gareth Bult" <address@hidden>
> Cc: address@hidden
> Sent: 17 January 2008 08:47:06 o'clock (GMT) Europe/London
> Subject: New IDEA: The Checksumming xlator ( AFR Translator have problem )
>
> Hi Gareth
>
> You said it!!, gluster is revolutionary!!
>
> AFR does a good job, we only have to help AFR be a better guy!!
>
> What we need is a checksumming translator!!
>
> Suppouse you have your posix volumes A and B on diferent servers.
>
> So your are using AFR(A,B) on client
>
> One of your AFRed node fails ( A ) and some time later it goes back to life
> but its backend filesystem
> got trashed and fsck'ed and now maybe there subtle differences on the files
> inside.
>
> ¡¡Your beloved 100GB XEN files now dont match on your "fautly" A node and
> your fresh B node!!
>
> AFR would notice this by means (i think) of a xattrS on both files, that's
> VERSION(FILE on node A) != VERSION(FILE on node B) or anything like that.
>
> But the real problem as you pointed out is that AFR only know files dont
> match, so have to copy every byte from you 100GB image from B to A
> (automatically on self-heal or on file access )
>
> That's many GB's (maybe PB's) going back and forth over the net. THIS IS
> VERY EXPENSIVE, all we know that.
>
> Enter the Checksumming xlator (SHA1 or MD5 maybe md4 as rsync seems to use
> that with any problem)
>
> Checksumming xlator sits a top your posix modules on every node. Whenever you
> request the xattr SHA1[block_number] on a file the checksumming xlator
> intercepts this call
> reads block number "block_number" from the file calculates SHA1 and returns
> this as xattr pair key:value.
>
> Now AFR can request SHA1 blockwise on both servers and update only those
> blocks that dont match SHA1.
>
> With a decent block size we can save a lot of info for every transaction.
>
> -- In the case your taulty node lost its contents you have to copy the whole
> 100GB XEN files again
> -- In the case SHA1 mismatch AFR can only update diferences saving a lot of
> resources like RSYNC does.
>
> One more avanced feature would be incoproprate xdelta librari functions,
> making possible generate binary patchs against files...
>
> Now we only need someone to implement this xlator :-)
>
> Regards
>
> El Jueves, 17 de Enero de 2008 01:49, escribió:
> > Mmm...
> >
> > There are a couple of real issues with self heal at the moment that make it
> > a minefield for the inexperienced.
> >
> > Firstly there's the mount bug .. if you have two servers and two clients,
> > and one AFR, there's a temptation to mount each client against a different
> > server. Which initially works fine .. right up until one of the
> > glusterfsd's ends .. when it still works fine. However, when you restart
> > the failed glusterfsd, one client will erroneously connect to it (or this
> > is my interpretation of the net effect), regardless of the fact that
> > self-heal has not taken place .. and because it's out of sync, doing a
> > "head -c1" on a file you know has changed gets you nowhere. So essentially
> > you need to remount clients against non-crashed servers before starting a
> > crashed server .. which is not nice. (this is a filed bug)
> >
> > Then we have us poor XEN users who store 100Gb's worth of XEN images on a
> > gluster mount .. which means we can live migrate XEN instances between
> > servers .. which is fantastic. However, after a server config change or a
> > server crash, it means we need to copy 100Gb between the servers .. which
> > wouldn't be so bad if we didn't have to stop and start each XEN instance in
> > order for self heal to register the file as changed .. and while self-heal
> > is re-copying the images, they can't be used, so you're looking as 3-4 mins
> > of downtime per instance.
> >
> > Apart from that (!) I think gluster is a revolutionary filesystem and will
> > go a long way .. especially if the bug list shrinks .. ;-)
> >
> > Keep up the good work :)
> >
> > [incidentally, I now have 3 separate XEN/gluster server stacks, all running
> > live-migrate - it works!]
> >
> > Regards,
> > Gareth.
> >
>
--
----------------------------
Clister UAH
----------------------------
_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel