gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Re: New IDEA: The Checksumming xlator ( AFR Translat


From: Gareth Bult
Subject: Re: [Gluster-devel] Re: New IDEA: The Checksumming xlator ( AFR Translator have problem )
Date: Thu, 17 Jan 2008 10:11:39 +0000 (GMT)

Erm, I said;

>to write the change to a logfile on the remaining volumes

By which I meant that the log file would be written on the remaining available 
server volumes ... (!)

Regards,
Gareth.

----- Original Message -----
step 3.: "Angel" <address@hidden>
To: address@hidden
Sent: 17 January 2008 10:07:12 o'clock (GMT) Europe/London
Subject: Re: [Gluster-devel] Re: New IDEA: The Checksumming xlator ( AFR 
Translator have problem )

The probems is:

If you place AFR on the client, how servers get the log file during recovery 
operations??

Regards, Angel


El Jueves, 17 de Enero de 2008 10:44, Gareth Bult escribió:
> Hi,
> 
> Yes, I would agree these changes would improve the current implementation.
> 
> However, a "better" way would be for the client, on failing to write to ONE 
> of the AFR volumes, to write the change to a logfile on the remaining volumes 
> .. then for the recovering server to playback the logfile when it comes back 
> up, or to recopy the file if there are insufficient logs or if the file has 
> been erased.
> 
> This would "seem" to be a very simple implementation .. 
> 
> Client;
> 
> Write to AFR
> If Fail then
>   if log file does not exist create log file
>   Record, file, version, offset, size, data in logfile
> 
> On server;
> 
> When recovering;
> 
>   for each entry in logfile
>      if file age > most recent transaction
>         re-copy whole file
>      else
>         replay transaction
> 
>   if all volumes "UP", remove logfile 
> 
> ?????
> 
> One of the REAL benefits of this is that the file is still available DURING a 
> heal operation.
> At the moment a HEAL only takes place when a file is being opened, and while 
> the copy is taking place the file blocks ...
> 
> Gareth.
> 
> ----- Original Message -----
> step 3.: "Angel" <address@hidden>
> To: "Gareth Bult" <address@hidden>
> Cc: address@hidden
> Sent: 17 January 2008 08:47:06 o'clock (GMT) Europe/London
> Subject: New IDEA: The Checksumming xlator ( AFR Translator have problem )
> 
> Hi Gareth
> 
> You said it!!, gluster is revolutionary!!
> 
> AFR does a good job, we only have to help AFR be a better guy!!
> 
> What we need is a checksumming translator!!
> 
> Suppouse you have your posix volumes A and B on diferent servers.
> 
> So your are using AFR(A,B) on client
> 
> One of your AFRed node fails ( A ) and some time later it goes back to life 
> but its backend filesystem 
> got trashed and fsck'ed and now maybe there subtle differences on the files 
> inside.
> 
> ¡¡Your beloved 100GB XEN files now dont match on your "fautly" A node and 
> your fresh B node!! 
> 
> AFR would notice this by means (i think) of a xattrS on both files, that's 
> VERSION(FILE on node A) != VERSION(FILE on node B) or anything like that.
> 
> But the real problem as you pointed out is that AFR only know files dont 
> match, so have to copy every byte from you 100GB image from B to A 
> (automatically on self-heal or on file access )
> 
> That's many GB's (maybe PB's)  going back and forth over the net. THIS IS 
> VERY EXPENSIVE, all we know that.
> 
> Enter the Checksumming xlator (SHA1 or MD5 maybe md4 as rsync seems to use 
> that with any problem)
> 
> Checksumming xlator sits a top your posix modules on every node. Whenever you 
> request the xattr SHA1[block_number] on a file the checksumming xlator 
> intercepts this call
> reads block number "block_number" from the file calculates SHA1 and returns 
> this as xattr pair key:value.
> 
> Now AFR can request SHA1 blockwise on both servers and update only those 
> blocks that dont match SHA1.
> 
> With a decent block size we can save a lot of info for every transaction.
> 
> -- In the case your taulty node lost its contents you have to copy the whole 
> 100GB XEN files again
> -- In the case SHA1 mismatch AFR can only update diferences saving a lot of 
> resources like RSYNC does. 
> 
> One more avanced feature would be incoproprate xdelta librari functions, 
> making possible generate binary patchs against files...
> 
> Now we only need someone to implement this xlator :-)
> 
> Regards
>  
> El Jueves, 17 de Enero de 2008 01:49, escribió:
> > Mmm...
> > 
> > There are a couple of real issues with self heal at the moment that make it 
> > a minefield for the inexperienced.
> > 
> > Firstly there's the mount bug .. if you have two servers and two clients, 
> > and one AFR, there's a temptation to mount each client against a different 
> > server. Which initially works fine .. right up until one of the 
> > glusterfsd's ends .. when it still works fine. However, when you restart 
> > the failed glusterfsd, one client will erroneously connect to it (or this 
> > is my interpretation of the net effect), regardless of the fact that 
> > self-heal has not taken place .. and because it's out of sync, doing a 
> > "head -c1" on a file you know has changed gets you nowhere. So essentially 
> > you need to remount clients against non-crashed servers before starting a 
> > crashed server .. which is not nice. (this is a filed bug)
> > 
> > Then we have us poor XEN users who store 100Gb's worth of XEN images on a 
> > gluster mount .. which means we can live migrate XEN instances between 
> > servers .. which is fantastic. However, after a server config change or a 
> > server crash, it means we need to copy 100Gb between the servers .. which 
> > wouldn't be so bad if we didn't have to stop and start each XEN instance in 
> > order for self heal to register the file as changed .. and while self-heal 
> > is re-copying the images, they can't be used, so you're looking as 3-4 mins 
> > of downtime per instance.
> > 
> > Apart from that (!) I think gluster is a revolutionary filesystem and will 
> > go a long way .. especially if the bug list shrinks .. ;-)
> > 
> > Keep up the good work :)
> > 
> > [incidentally, I now have 3 separate XEN/gluster server stacks, all running 
> > live-migrate - it works!]
> > 
> > Regards,
> > Gareth.
> >
> 

-- 
----------------------------
Clister UAH
----------------------------


_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel




reply via email to

[Prev in Thread] Current Thread [Next in Thread]